See https://github.com/tesseract-ocr/tesseract/issues/758 and other similar issues
On Sun, Jul 12, 2020 at 6:52 PM Shree Devi Kumar <shreesh...@gmail.com> wrote: > @Eliyaz What version of tesseract are you using? Which traineddata? > > >Always the letter "لا" is predicted as "ال" . > > I think this was fixed by Ray Smiith in 2017 and should be ok in the > traineddata files in tessdata_fast and tessdata_best repos. > > On Sun, Jul 12, 2020 at 6:45 PM Rainer Verteidiger < > materialdefender2...@gmail.com> wrote: > >> >> Always the letter "لا" is predicted as "ال" . >> >> Not sure how much relevancy that bears in the context of training models, >> but لا is no letter! It's a ligature ("Arabic Ligature Lam with Alef") >> formed by combining ل ("Arabic Letter Lam") with ا ("Arabic Letter Alef") >> whereas ال is ا followed by ل (so, the exact opposite way around; no >> ligature). Both are incredibly common in Arabic texts and although I have >> no clue about machine learning, I'm surprised how the training could miss >> the difference between them. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-ocr+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/de95d94b-9dcd-432c-a06c-3180d6c741afo%40googlegroups.com >> <https://groups.google.com/d/msgid/tesseract-ocr/de95d94b-9dcd-432c-a06c-3180d6c741afo%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > > > -- > > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > -- ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUsGFxyk6HyA2Ya9fEqEEkTqXmPe9FzSSkB1GY4h2DEQw%40mail.gmail.com.