[tesseract-ocr] Re: Tesseract mistakes letters for numbers

zdenop Wed, 11 Aug 2021 23:34:56 -0700

tesseract string.jpg -
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 558
SI312533


I use language model from here https://github.com/tesseract-ocr/tessdata 
and tesseract 4.1.1
 leptonica-1.81.0 (May 22 2021, 16:14:25) [MSC v.1928 LIB Release x64]
  libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.0.91) : libpng 1.6.37 : 
libtiff 4.2.0 : zlib 1.2.11 : libwebp 1.2.0 : libopenjp2 2.4.0
 Found AVX2
 Found AVX
 Found FMA
 Found SSE
Dátum: streda 21. júla 2021, čas: 20:07:15 UTC+2, odosielateľ: 
[email protected]

> I need some help. I have a bunch of images of text like this:
>
> [image: sample_si.jpg]
> They are all 200 dpi, black and white images. In over 50% of the cases, 
> Tesseract confuses the "SI" at the front for digits. Most of them are "51", 
> but some are "81" or "31".
>
> I've tried tweaking all of the settings I can find, but none of them 
> improve the results. I'm currently using a config file like this:
>
> tessedit_char_whitelist ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789
>
> Interesting fact: If I cut off the digits and only send the alphas to 
> Tesseract, it recognizes them correctly. Is there something in Tesseract 
> that makes it less likely to mix letters and numbers in a single word?
>
> Any suggestions?
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/11da51d4-6d46-4184-a8e5-e325faac7a7fn%40googlegroups.com.

[tesseract-ocr] Re: Tesseract mistakes letters for numbers

Reply via email to