[tesseract-ocr] Whitelist is not accepting special characters

Shadya S. Sat, 26 Aug 2023 21:25:20 -0700

I'm using Tesseract (version 5.3.1) in Windows to recognize characters from 
a text that includes special characters like ñüá. Most of these characters 
are within the Latin script, so I've declared this in the command line.


In this image, the special characters are ñ,Ñ,á,é.
[image: text.png]

The command line I'm using is
* tesseract text.png stdout --psm 6 -l Latin -c 
tessedit_char_whitelist=aáeéiocfhklmnñtÑ*

However, the output text is missing white spaces between words, and the 
special characters are being completely ignored, resulting in:
*aoloaalcalmoo*
*okonioniachillalif *


Do you know why tesseract is not taking into account the characters I've 
declared in the whitelist? Maybe I'm not correctly specifying the special 
characters

Any help is greatly appreciated.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/843a1439-45ba-422c-8ba8-40fa557938b3n%40googlegroups.com.

[tesseract-ocr] Whitelist is not accepting special characters

Reply via email to