Re: If you OCR, always archive the bitmaps too - Re: Regarding Manuals

Fred Cisin Sun, 27 Sep 2015 11:34:02 -0700

On Sun, 27 Sep 2015, Pontus Pihlgren wrote:

It seems to me that a better tool could solve the issue. One that
could display the OCR:ed content only and the scanned content
only when desired, for instance when you suspect an error.
Is there such a reader? Is the content organised to make it
possible.


I haven't seen one.

I did start trying to write an heuristic probabilistic OCR one 25 yearsago. The idea being to overlay the OCR'd (displayed with matching fonts)over the scanned content. Besides visual confirmation and indication ofprobability of accuracy with each character, it lends itself well tohiring neighborhood kids to type in just the "wrong" characters to cleanup the OCR'd file, and heuristically tune the font database, includingadding new fonts - EVERY character is "wrong" until it repeats a few timesin the document. ("clean up" a NYT article, and the OCR now has theirfont).

Re: If you OCR, always archive the bitmaps too - Re: Regarding Manuals

Reply via email to