[tesseract-ocr] Re: Tesseract mistakes letters for numbers

Ajinkya Bobade Wed, 11 Aug 2021 21:51:17 -0700

Hello,

To do this you will need to retrain Tessearct on top of the model that you 
currently use. The current model that you use is not trained on this 
specific font, so it approximates the digit, take few samples of the format 
that you need and retrain it on top of original weights. If you have more 
questions feel free to email me.


Regards
Ajinkya
Creator of AI Scanner https://imagescanner-online.com/ 

On Thursday, 22 July 2021 at 00:07:15 UTC+5:30 [email protected] wrote:

> Update:
>
> I discovered the command line option:
>
>     -c load_number_dawg=0
>
> That did not improve my results.
>
> On Wednesday, July 21, 2021 at 1:07:15 PM UTC-5 Eric Hodges wrote:
>
>> I need some help. I have a bunch of images of text like this:
>>
>> [image: sample_si.jpg]
>> They are all 200 dpi, black and white images. In over 50% of the cases, 
>> Tesseract confuses the "SI" at the front for digits. Most of them are "51", 
>> but some are "81" or "31".
>>
>> I've tried tweaking all of the settings I can find, but none of them 
>> improve the results. I'm currently using a config file like this:
>>
>> tessedit_char_whitelist ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789
>>
>> Interesting fact: If I cut off the digits and only send the alphas to 
>> Tesseract, it recognizes them correctly. Is there something in Tesseract 
>> that makes it less likely to mix letters and numbers in a single word?
>>
>> Any suggestions?
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/71e52bfe-0a27-44b1-b70e-2907aa722561n%40googlegroups.com.

[tesseract-ocr] Re: Tesseract mistakes letters for numbers

Reply via email to