[tesseract-ocr] Re: FIlipino character set (alphabet) support

Constantine Dokolas Sun, 22 Sep 2024 06:00:35 -0700

Thanks for the feedback.

I've already tried with "fil+spa" with no success :(


One thing that worries me is that I cannot find *one* sample filipino text 
image with Ñ/ñ on it, just to have an independently produced sample. All I 
have is a couple of small snippets of text which produce the plain 
characters only.

C.D.

On Sunday, September 22, 2024 at 9:29:13 AM UTC+3 [email protected] wrote:

> On Friday, September 20, 2024 at 12:03:47 PM UTC-4 [email protected] 
> wrote:
>
>
> I'm looking into Filipino support by Tesseract OCR. It appears that at 
> least Ñ/ñ is not supported. They should as you can see here 
> <https://en.wikipedia.org/wiki/Filipino_alphabet#Alphabet>.
>
> I'm being told that other latin characters are also used, like those in 
> Spanish. Is this true?
>
>
> The Filipino support definitely looks incomplete. Neither fil.unicharset 
> [1] nor the training text [2] includes. Since it sounds like they are 
> principally used for Spanish loan words, one solution might be to use both 
> languages (ie fil+esp). You could also try the generic Latin script data.
>
> Tom
>
> [1] 
> https://github.com/tesseract-ocr/langdata_lstm/blob/main/fil/fil.unicharset
> [2] 
> https://github.com/tesseract-ocr/langdata_lstm/blob/main/fil/fil.training_text
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/246aaa6d-f971-444d-9faf-50b189e4cf0cn%40googlegroups.com.

[tesseract-ocr] Re: FIlipino character set (alphabet) support

Reply via email to