Re: OCR old software listing

Fred Cisin via cctalk Mon, 31 Dec 2018 09:47:26 -0800

On Mon, 31 Dec 2018, Larry Kraemer via cctalk wrote:

I used the libtiff-tools (Debian 8.x - 32 Bit) to extract all 61 .TIF'sfrom the Multipage .tif file. While the .tif's look descent, andRasterVect shows the .tif properties to be Group 4 Fax (1bpp) with 5100x 6600 pixels - 300 DPI, I can't get tesseract 3.x, TextBridge Classic2.0, or Irfanview with KADMOS Plugin to OCR any of the .tif files, withdescent results. I'd expect an OCR of 85 to 90 % correct conversion toASCII text.


Software listings need more accuraacy than that.
How many wrong characters does it take for a program not to work?
"desCent" isn't good enough.

85 to 90 % correct is a character wrong in every 6 to 10 characters.
How many errors is that PER LINE?

"But, you can start with that, and just fix the errors, without retypingthe rest." Doing it that way is a desCent into madness.

BTDT.  wore out the T-shirts.

A competent typist can retype the whole thing faster than fixing an errorin every six to ten characters.Only if there is less than one error for every several hundred charactersdoes "patching it" save time for a competent typist.In general, for a competent typist, the fastest way to reposition thecursor to the next error in the line is to simply hit the keys of theintervening letters.It is NOT to move the cursor with the mouse, then put your hand back onthe keys to type a character.Using cursor motion keys is no faster for a competent typist than hittingthe keys of the letters toskip over.

TIP: display the OCR'ed text that is to be corrected in a font thatexaggerates the difference between zero and the letter 'O', and betweenone and lower case 'l'. There are some programs that will attempt toselect those based on context.


--
Grumpy Ol' Fred                 ci...@xenosoft.com

Re: OCR old software listing

Reply via email to