Thanks for the feedback.

I've already tried with "fil+spa" with no success :(

One thing that worries me is that I cannot find *one* sample filipino text 
image with Ñ/ñ on it, just to have an independently produced sample. All I 
have is a couple of small snippets of text which produce the plain 
characters only.

C.D.

On Sunday, September 22, 2024 at 9:29:13 AM UTC+3 tfmo...@gmail.com wrote:

> On Friday, September 20, 2024 at 12:03:47 PM UTC-4 cdok...@gmail.com 
> wrote:
>
>
> I'm looking into Filipino support by Tesseract OCR. It appears that at 
> least Ñ/ñ is not supported. They should as you can see here 
> <https://en.wikipedia.org/wiki/Filipino_alphabet#Alphabet>.
>
> I'm being told that other latin characters are also used, like those in 
> Spanish. Is this true?
>
>
> The Filipino support definitely looks incomplete. Neither fil.unicharset 
> [1] nor the training text [2] includes. Since it sounds like they are 
> principally used for Spanish loan words, one solution might be to use both 
> languages (ie fil+esp). You could also try the generic Latin script data.
>
> Tom
>
> [1] 
> https://github.com/tesseract-ocr/langdata_lstm/blob/main/fil/fil.unicharset
> [2] 
> https://github.com/tesseract-ocr/langdata_lstm/blob/main/fil/fil.training_text
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/246aaa6d-f971-444d-9faf-50b189e4cf0cn%40googlegroups.com.

Reply via email to