Well, that’d require much additional logic because the general layout 
entails quite a diverse segmentation.

The main question is, why Tesseract obviously has severe trouble with clear 
Russian, no-noise PNGs—and what could be done about it.

On Thursday, October 8, 2020 at 7:08:28 AM UTC+2 shree wrote:

> Give each region of interest separately.
>
>
> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
>  Virus-free. 
> www.avg.com 
> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
>  
> <#m_-7139881135647065081_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>
> On Wed, Oct 7, 2020 at 6:01 PM 'd-ka' via tesseract-ocr <
> tesser...@googlegroups.com> wrote:
>
>>
>> I’d like to process Duolingo screenshots with Tesseract, in order to have 
>> exercises worth reiterating in a searchable form (i.e. a text file). 
>> However, it just yields gibberish:
>>
>> > tesseract.exe img.jpg img.jpg -l rus+eng --tessdata-dir "\tessdata"
>>
>> [image: FXjEk.png]
>>
>> Э 20:22
>> 51МАВО\М/
>> Тгапз(а{е {15 5еп{епсе
>> Апу диес00п5
>> Уоч аге согтес& |"
>> СОМТИМЧЕ
>> Ч 4
>>
>>
>>    - For my inherent neural network, it’s easy to resolve: clear 
>>    contrasts, easy font, no scanning artifacts.
>>    - It doesn’t read the actual Russian part at all (Вопросы есть?), yet 
>>    I don’t find the font weight too light or thin.
>>    - No luck with greyscale or increased contrast, or by varations of 
>>    rus+eng.
>>    - I assume that it’s implicitly UTF-8 
>>    
>> <https://stackoverflow.com/questions/9976592/tesseract-does-not-recognize-russian>
>>  
>>    and that I already have appropriate trained data 
>>    
>> <https://stackoverflow.com/questions/63431711/easily-readable-text-not-recognized-by-tesseract>
>>    .
>>    - What could help Tesseract to properly parse this seemingly easy 
>>    imagery?
>>
>> Thanks so much!
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/4978d94a-ec7d-4bce-b8be-cd58576d4ab2n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/4978d94a-ec7d-4bce-b8be-cd58576d4ab2n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
>
> -- 
>
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/532085ec-4019-452e-8550-0dee5182ad95n%40googlegroups.com.

Reply via email to