Well, that’d require much additional logic because the general layout entails quite a diverse segmentation.
The main question is, why Tesseract obviously has severe trouble with clear Russian, no-noise PNGs—and what could be done about it. On Thursday, October 8, 2020 at 7:08:28 AM UTC+2 shree wrote: > Give each region of interest separately. > > > <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> > Virus-free. > www.avg.com > <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> > > <#m_-7139881135647065081_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > > On Wed, Oct 7, 2020 at 6:01 PM 'd-ka' via tesseract-ocr < > tesser...@googlegroups.com> wrote: > >> >> I’d like to process Duolingo screenshots with Tesseract, in order to have >> exercises worth reiterating in a searchable form (i.e. a text file). >> However, it just yields gibberish: >> >> > tesseract.exe img.jpg img.jpg -l rus+eng --tessdata-dir "\tessdata" >> >> [image: FXjEk.png] >> >> Э 20:22 >> 51МАВО\М/ >> Тгапз(а{е {15 5еп{епсе >> Апу диес00п5 >> Уоч аге согтес& |" >> СОМТИМЧЕ >> Ч 4 >> >> >> - For my inherent neural network, it’s easy to resolve: clear >> contrasts, easy font, no scanning artifacts. >> - It doesn’t read the actual Russian part at all (Вопросы есть?), yet >> I don’t find the font weight too light or thin. >> - No luck with greyscale or increased contrast, or by varations of >> rus+eng. >> - I assume that it’s implicitly UTF-8 >> >> <https://stackoverflow.com/questions/9976592/tesseract-does-not-recognize-russian> >> >> and that I already have appropriate trained data >> >> <https://stackoverflow.com/questions/63431711/easily-readable-text-not-recognized-by-tesseract> >> . >> - What could help Tesseract to properly parse this seemingly easy >> imagery? >> >> Thanks so much! >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/4978d94a-ec7d-4bce-b8be-cd58576d4ab2n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/4978d94a-ec7d-4bce-b8be-cd58576d4ab2n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > > > -- > > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/532085ec-4019-452e-8550-0dee5182ad95n%40googlegroups.com.