@Eliyaz What version of tesseract are you using? Which traineddata?

>Always the letter "لا" is predicted as "ال" .

I think this was fixed by Ray Smiith in 2017 and should be ok in the
traineddata files in tessdata_fast and tessdata_best repos.

On Sun, Jul 12, 2020 at 6:45 PM Rainer Verteidiger <
materialdefender2...@gmail.com> wrote:

>
> Always the letter "لا" is predicted as "ال" .
>
> Not sure how much relevancy that bears in the context of training models,
> but لا is no letter! It's a ligature ("Arabic Ligature Lam with Alef")
> formed by combining ل ("Arabic Letter Lam") with ا ("Arabic Letter Alef")
> whereas ال is ا followed by ل (so, the exact opposite way around; no
> ligature). Both are incredibly common in Arabic texts and although I have
> no clue about machine learning, I'm surprised how the training could miss
> the difference between them.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/de95d94b-9dcd-432c-a06c-3180d6c741afo%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/de95d94b-9dcd-432c-a06c-3180d6c741afo%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>


-- 

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXcKkS7_FmnNpgkDDdwwJmu1v4P8U7dA5yANk8VYHcmDQ%40mail.gmail.com.

Reply via email to