Update:
I discovered the command line option:
-c load_number_dawg=0
That did not improve my results.
On Wednesday, July 21, 2021 at 1:07:15 PM UTC-5 Eric Hodges wrote:
> I need some help. I have a bunch of images of text like this:
>
> [image: sample_si.jpg]
> They are all 200 dpi, black and white images. In over 50% of the cases,
> Tesseract confuses the "SI" at the front for digits. Most of them are "51",
> but some are "81" or "31".
>
> I've tried tweaking all of the settings I can find, but none of them
> improve the results. I'm currently using a config file like this:
>
> tessedit_char_whitelist ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789
>
> Interesting fact: If I cut off the digits and only send the alphas to
> Tesseract, it recognizes them correctly. Is there something in Tesseract
> that makes it less likely to mix letters and numbers in a single word?
>
> Any suggestions?
>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/af5a508c-cca8-4db1-a741-4aa10972c129n%40googlegroups.com.