Re: [tesseract-ocr] advice for OCR'ing 9-pin dot matrix BASIC code

2021-01-05 Thread Ben Bongalon
licly > so going forward I can help the next person. > > Keith > > > Original message > From: Ben Bongalon > Date: 1/5/21 11:56 PM (GMT-05:00) > To: Keith M > Cc: tesseract-ocr@googlegroups.com > Subject: Re: [tesseract-ocr] advice for OCR'i

Re: [tesseract-ocr] advice for OCR'ing 9-pin dot matrix BASIC code

2021-01-05 Thread Ben Bongalon
of document, DPI/resolution, font, or anything.I > know I sound like a broken record. Current numbers include stats like > 44% of the 100-page document is 95% or better confidence. Now those > lines could still be wrong, but they look pretty decent in a quick scan. > &g

Re: [tesseract-ocr] advice for OCR'ing 9-pin dot matrix BASIC code

2021-01-05 Thread Ben Bongalon
Hi Keith, Interesting project. Having looked at the sample OCR results that Alex posted, I think the poor recognition from Tesseract is more likely due to the underlying language model used (I'm assuming you used 'eng'?). For example, the "test1" OCR results correctly transcribes the variables

[tesseract-ocr] How to generate .lstmf file with non-randomized lines

2021-01-04 Thread Ben Bongalon
Hello and Happy New Year, I am training Tesseract 4 to recognize special characters in a Philippine bilingual dictionary (specifically Hanunoo -> English). Following the "Fine Tuning" tutorial but using Spanish as starting model, I am getting good recognition accuracy on some characters such a