Hi all, I'm trying to ocr images similar to the attached one, but the error rate of Latin words is too high.
I tried all PSMs with the following models from tessdata_best: *ara*, *eng* , *fra*, *Ara (*in different orders)*. *I even tried finetuning them on the font used in the input images. *Sample output (error in bold):* قرارلمجلس المنافسة عدد 0028/ق/2022 صبادر25 من شعبان 1443 (28 مارس 2022) والمتعلق بتولي الشركة القابضة للمساهمات والاستثمارات *«11010108-:2م1]»* للمر اقبة المشتركة على شركة «CMGP Group Sa» وذلك عبراقتناء نسبة14,81 96 من أسيم رأسمالها وحقوق التصويت المرتبطة به. The results often have incorrect recognition of Latin words. Is there any solution to this issue? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/b4c64a93-8da6-45b0-8a3e-03372d1c6be4n%40googlegroups.com.