My colleagues are quite dismissive about the challenges of OCR, arguing that, since it was one of the first machine vision application areas all the problems have been solved.
Right and wrong.
According to Winn Hardin, author of “Designing Machine Vision for Optical Character Recognition,” (February 21st, 2012,) the first patent application for OCR was filed in 1929. That’s a fascinating piece of trivia to trot out at your next dinner party, but it also proves that my colleagues were more right than even they thought – yes, OCR has been around some eighty years. But reading further reveals that even today it’s no slam-dunk.
Winn quotes Olaf Hilgenfeld of German integrator Vitronic as saying that “…letters need to have space between them; OCR read rates on linked letters are significantly lower.” In other words, computers still struggle to interpret joined up writing.
And then, just as I was marveling yet again at what an incredible pattern recognition system each of us carries around, I read that MVTec and integrator Eckelmann implemented a handwriting recognition system “a few years ago…”.
Which is it guys: Handwriting recognition solved or not solved?
5 comments:
OCR is barely "solved" for applications where characters are printed under perfect conditions.
Anyone trying to read dot-pined, cast-in, stamped, or even ink-jet printing in less than pristine environments will agree.
As soon as non-character artifacts explode onto the scene, you suddenly have a problem that is by no means "pre-solved".
It's both. MVTec and Eckelmann developed an application that can read handwritten characters, but you'll notice that the characters on the labels being read are separated. HALCON offers tools for segmenting scripted text, or characters that are touching each other, but this is still a very difficult task to implement robustly.
Anyone claiming that OCR is solved must not be talking about industrial settings.
Characters are easily mixed up when non-character particulate join the scene, and especially when parts of a character go missing.
Ask someone doing OCR on text that is stamped, dot-pined, cast-in, or jet printed in any non-pristine environment on any non-pristine surface and they'll tell you that OCR is far from "solved".
Consider a jet-printed system that occasionally drops scan lines, or prints over-top of debris that eventually falls off. I've seen "B"s make shocking transformations into "L"s.
ORC on the OCR-A text at the bottom of a passport that you scan through a slot at the airport, sure, that's solved, but anything that needs to be rugged and robust, no.
Don't expect the "ultimate OCR solution" to come from MVTec, NI or any other software vendor. Instead, look to the software running on your iPhone. There is some incredible stuff in the app store and it's improving at an incredible pace. The only proof I need comes from Word Lens: http://www.youtube.com/watch?v=h2OfQdYrHRs
reCAPTCHA is a prefect example of how OCR is currently not good enough to solve the problem.
I'm currently doing my PhD in recovering text from scene images. OCR engines are only a small part of recognising text. The whole field from text-like component finding, perceptual grouping, perspective rectification, and OCR engines are still very unsolved and open problems.
OCR systems work very well for text that is typed and perfectly scanned. OCR struggles with: noise caused by lighting or specular points, range of unseen fonts, perspective problems, just to name a few.
Great recognition systems arent an impossible dream for sure, but they are also not near the market yet.
Post a Comment