(base) ubuntu@tesseract-ocr-1:~/TEST$ tesseract rus.png - -l rus+eng
--tessdata-dir ~/tessdata_best


D 20:22 Э
5IN AROW 5IN AROW 5IN AROW
Translate this sentence Translate this sentence Translate this sentence
(0) Вопросы есть? (0) Вопросы есть? Вопросы есть?
Апу questions Any questions Any questions
15 15 IS
You are correct |" You are correct |" You are correct |"

CONTINUE [elo] Nay 1]V] CONTINUE

A A 4
(base) ubuntu@tesseract-ocr-1:~/TEST$ tesseract rus.png - -l rus+eng
--tessdata-dir ~/tessdata_fast


О 20:22 9)
5 1МАВОМ/ 5INAROW 5INAROW
Translate this sentence Translate this sentence Translate this sentence
о Вопросы есть? о Вопросы есть? Вопросы есть?
Апу questions Any questions Any questions
15 15 15
You are correct [мы You are correct [мы You are correct [мы

CONTINUE CONTINUE CONTINUE

” a 7 в
(base) ubuntu@tesseract-ocr-1:~/TEST$ tesseract rus.png - -l rus+eng
--tessdata-dir ~/tessdata


D 20:22 °)
5IN AROW 5IN AROW 5IN AROW
Translate this sentence Translate this sentence Translate this sentence
Ф Вопросы есть? Ф Вопросы есть? Вопросы есть?
Апу questions Any questions Any questions
15 15 IS
You are correct |- You are correct |- You аге correct |-

CONTINUE СОМПИМЧЕ СОМТПИЧЧЕ

о ) 4

On Fri, Jan 8, 2021 at 1:44 AM 'd-ka' via tesseract-ocr <
tesseract-ocr@googlegroups.com> wrote:

> I still fail to understand why Tesseract performs so poorly. Isn’t it made
> for OCR in screenshots? Doesn’t it understand Russian at all?
>
> On Monday, November 2, 2020 at 5:45:41 PM UTC+1 d-ka wrote:
>
>> Well, that’d require much additional logic because the general layout
>> entails quite a diverse segmentation.
>>
>> The main question is, why Tesseract obviously has severe trouble with
>> clear Russian, no-noise PNGs—and what could be done about it.
>>
>> On Thursday, October 8, 2020 at 7:08:28 AM UTC+2 shree wrote:
>>
>>> Give each region of interest separately.
>>>
>>>
>>> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
>>>  Virus-free.
>>> www.avg.com
>>> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
>>> <#m_-5007602804264942985_m_-4931165716036561972_m_-7139881135647065081_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>>>
>>> On Wed, Oct 7, 2020 at 6:01 PM 'd-ka' via tesseract-ocr <
>>> tesser...@googlegroups.com> wrote:
>>>
>>>>
>>>> I’d like to process Duolingo screenshots with Tesseract, in order to
>>>> have exercises worth reiterating in a searchable form (i.e. a text file).
>>>> However, it just yields gibberish:
>>>>
>>>> > tesseract.exe img.jpg img.jpg -l rus+eng --tessdata-dir "\tessdata"
>>>>
>>>> [image: FXjEk.png]
>>>>
>>>> Э 20:22
>>>> 51МАВО\М/
>>>> Тгапз(а{е {15 5еп{епсе
>>>> Апу диес00п5
>>>> Уоч аге согтес& |"
>>>> СОМТИМЧЕ
>>>> Ч 4
>>>>
>>>>
>>>>    - For my inherent neural network, it’s easy to resolve: clear
>>>>    contrasts, easy font, no scanning artifacts.
>>>>    - It doesn’t read the actual Russian part at all (Вопросы есть?),
>>>>    yet I don’t find the font weight too light or thin.
>>>>    - No luck with greyscale or increased contrast, or by varations of
>>>>    rus+eng.
>>>>    - I assume that it’s implicitly UTF-8
>>>>    
>>>> <https://stackoverflow.com/questions/9976592/tesseract-does-not-recognize-russian>
>>>>    and that I already have appropriate trained data
>>>>    
>>>> <https://stackoverflow.com/questions/63431711/easily-readable-text-not-recognized-by-tesseract>
>>>>    .
>>>>    - What could help Tesseract to properly parse this seemingly easy
>>>>    imagery?
>>>>
>>>> Thanks so much!
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to tesseract-oc...@googlegroups.com.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/tesseract-ocr/4978d94a-ec7d-4bce-b8be-cd58576d4ab2n%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/4978d94a-ec7d-4bce-b8be-cd58576d4ab2n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>
>>>
>>> --
>>>
>>> ____________________________________________________________
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/18aaa2c8-a580-4932-9cc9-8659fe27a1a0n%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/18aaa2c8-a580-4932-9cc9-8659fe27a1a0n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>


-- 

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVKC9Dh-n15AXdE5Fi81%2BkRWVzZZ4iwvoddnr_aLRagew%40mail.gmail.com.

Reply via email to