Re: [tesseract-ocr] Tesseract-OCR Training Arabic text & numbers

Rainer Verteidiger Sun, 12 Jul 2020 06:15:14 -0700

 

Always the letter "لا" is predicted as "ال" .


Not sure how much relevancy that bears in the context of training models, 
but لا is no letter! It's a ligature ("Arabic Ligature Lam with Alef") 
formed by combining ل ("Arabic Letter Lam") with ا ("Arabic Letter Alef") 
whereas ال is ا followed by ل (so, the exact opposite way around; no 
ligature). Both are incredibly common in Arabic texts and although I have 
no clue about machine learning, I'm surprised how the training could miss 
the difference between them.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/de95d94b-9dcd-432c-a06c-3180d6c741afo%40googlegroups.com.

Re: [tesseract-ocr] Tesseract-OCR Training Arabic text & numbers

Reply via email to