Re: [tesseract-ocr] Make russian_with_accent traineddata file

Romain B. (Le Belge) Fri, 09 Feb 2024 03:03:08 -0800

Here is all the informations to reproduce my problem:

Here is an image from my russian learning book(french version)
[image: testTes2.png]
If you run it with tesseract(while using the russian + french language) 
with this command: 
*tesseract testTes2.png stdout -l rus+fra*
You will get this result:
[image: Capture d’écran du 2024-02-09 11-47-14.png]


As you can see, Tesseract (not used to russian having accents on vowels, 
again only used for educational purposes), interprets ó for б, é for ё,...

I'm trying to fix this issue. By what i have read, i think i need to 
re-train the russian language in tesseract for it to support accents.
I found this 
<https://github.com/tesseract-ocr/langdata/tree/main/rus_accent> folder in 
langdata, but can't find a way to use it to re-train the russian language.

How can i use the rus_accent folder and its files to easily re-train the 
russian language ?

I hope my explanation was clear enough. (Sorry if i made some grammatical 
or some other english mistakes, english is not my native language).


Le mardi 6 février 2024 à 18:51:01 UTC+1, zdenop a écrit :

> You are referring old issue...
> You either provide steps to replicate your problem (including input image) 
> or you have to solve it by yourself.
>
> Zdenko
>
>
> po 5. 2. 2024 o 9:53 Romain B. (Le Belge) <romainbar...@gmail.com> 
> napísal(a):
>
>> Hi,
>> <https://stackoverflow.com/posts/77897165/timeline>
>>
>> I saw that tesseract make the mistakes of turning russian vowels with 
>> accents(ò,à,...)(used for educational purposes most of the time) into other 
>> russian letters, and saw that someone, with the same problem 
>> <https://github.com/tesseract-ocr/langdata/pull/12>, had created trained 
>> data(if i understood correctly) for russian with accents 
>> <https://github.com/tesseract-ocr/langdata/tree/main/rus_accent>
>>
>> The problem is, i can not find a way to make it a traineddata file, to 
>> test it and later use it in my code. I found the tesstrain 
>> <https://github.com/tesseract-ocr/tesstrain> git, but was not able to 
>> make it work with the data found.
>>
>> I honestly don't know if I am missing something, not understanding 
>> correctly something, or if we simply don't train data with these types of 
>> files anymore.
>>
>> If you got any clue, that would help me a lot.
>>
>> Thank you!
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/201355ba-dafd-49fd-b0a7-3b42fd8175d8n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/201355ba-dafd-49fd-b0a7-3b42fd8175d8n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/72616f7b-fff4-46ca-8c00-0d186dbdde06n%40googlegroups.com.

Re: [tesseract-ocr] Make russian_with_accent traineddata file

Reply via email to