Inevitably, these problems will occur - and when they do, we need to utilize our computer vision, image processing, and OCR skills to pre-process and improve the quality of these damaged documents. Give us a piece of paper and enough time, and I guarantee that even the most organized of us will take that document from the pristine condition and eventually introduce some stains, rips, folds, and crinkles on it. And all too common are the clear signs that an actual human has handled the paper, including coffee mug stains on the corners, paper crinkling, rips, tears, etc.įor all the amazing things the human mind can do, it seems like we’re all just walking accidents waiting to happen when it comes to printed materials.A mobile phone scanner app may have been used under poor lighting conditions, making it incredibly challenging for human eyes to read the text, let alone a computer.An old scanner could have been used when scanning the document, resulting in low image resolution and poor text contrast.The printer could be low on toner or ink, resulting in the text appearing faded and hard to read.From there, OCR becomes much more challenging. That all changes once a piece of text is printed and scanned. And most of the time, the text doesn’t exist on a complex background. There is sufficient contrast between the background and foreground. When working with documents generated by a computer, screenshots, or essentially any piece of text that has never touched a printer and then scanned, OCR becomes far easier. Looking for the source code to this post? Jump Right To The Downloads Section Using Machine Learning to Denoise Images for Better OCR Accuracy
0 Comments
Leave a Reply. |