IMO there is not need to use psm and whitelist: tesseract text.png - -l fast/script/Latin Estimating resolution as 274 Ñato ñelo ñaña álca moño
Ñoko niño niña chillňa élif For Windows I guess there could be a problem with UTF-8 in the terminal... Zdenko ne 27. 8. 2023 o 6:25 Shadya S. <[email protected]> napísal(a): > I'm using Tesseract (version 5.3.1) in Windows to recognize characters > from a text that includes special characters like ñüá. Most of these > characters are within the Latin script, so I've declared this in the > command line. > > In this image, the special characters are ñ,Ñ,á,é. > [image: text.png] > > The command line I'm using is > * tesseract text.png stdout --psm 6 -l Latin -c > tessedit_char_whitelist=aáeéiocfhklmnñtÑ* > > However, the output text is missing white spaces between words, and the > special characters are being completely ignored, resulting in: > *aoloaalcalmoo* > *okonioniachillalif * > > > Do you know why tesseract is not taking into account the characters I've > declared in the whitelist? Maybe I'm not correctly specifying the special > characters > > Any help is greatly appreciated. > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/843a1439-45ba-422c-8ba8-40fa557938b3n%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/843a1439-45ba-422c-8ba8-40fa557938b3n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wN2R0MOBV7HNg%2BZYbXYU%2BfpPhKkKNgM0t4J0saWPm%2Bug%40mail.gmail.com.

