[tesseract-ocr] Re: FIlipino character set (alphabet) support

Tom Morris Sat, 21 Sep 2024 23:29:18 -0700

On Friday, September 20, 2024 at 12:03:47 PM UTC-4 [email protected] wrote:



I'm looking into Filipino support by Tesseract OCR. It appears that at 
least Ñ/ñ is not supported. They should as you can see here 
<https://en.wikipedia.org/wiki/Filipino_alphabet#Alphabet>.

I'm being told that other latin characters are also used, like those in 
Spanish. Is this true?


The Filipino support definitely looks incomplete. Neither fil.unicharset 
[1] nor the training text [2] includes. Since it sounds like they are 
principally used for Spanish loan words, one solution might be to use both 
languages (ie fil+esp). You could also try the generic Latin script data.

Tom

[1] 
https://github.com/tesseract-ocr/langdata_lstm/blob/main/fil/fil.unicharset
[2] 
https://github.com/tesseract-ocr/langdata_lstm/blob/main/fil/fil.training_text

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a2ba692a-fe69-4888-94a2-738eec65a71dn%40googlegroups.com.

[tesseract-ocr] Re: FIlipino character set (alphabet) support

Reply via email to