tesseract string.jpg - Warning: Invalid resolution 0 dpi. Using 70 instead. Estimating resolution as 558 SI312533
I use language model from here https://github.com/tesseract-ocr/tessdata and tesseract 4.1.1 leptonica-1.81.0 (May 22 2021, 16:14:25) [MSC v.1928 LIB Release x64] libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.0.91) : libpng 1.6.37 : libtiff 4.2.0 : zlib 1.2.11 : libwebp 1.2.0 : libopenjp2 2.4.0 Found AVX2 Found AVX Found FMA Found SSE Dátum: streda 21. júla 2021, čas: 20:07:15 UTC+2, odosielateľ: [email protected] > I need some help. I have a bunch of images of text like this: > > [image: sample_si.jpg] > They are all 200 dpi, black and white images. In over 50% of the cases, > Tesseract confuses the "SI" at the front for digits. Most of them are "51", > but some are "81" or "31". > > I've tried tweaking all of the settings I can find, but none of them > improve the results. I'm currently using a config file like this: > > tessedit_char_whitelist ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 > > Interesting fact: If I cut off the digits and only send the alphas to > Tesseract, it recognizes them correctly. Is there something in Tesseract > that makes it less likely to mix letters and numbers in a single word? > > Any suggestions? > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/11da51d4-6d46-4184-a8e5-e325faac7a7fn%40googlegroups.com.

