Here is all the informations to reproduce my problem: Here is an image from my russian learning book(french version) [image: testTes2.png] If you run it with tesseract(while using the russian + french language) with this command: *tesseract testTes2.png stdout -l rus+fra* You will get this result: [image: Capture d’écran du 2024-02-09 11-47-14.png]
As you can see, Tesseract (not used to russian having accents on vowels, again only used for educational purposes), interprets ó for б, é for ё,... I'm trying to fix this issue. By what i have read, i think i need to re-train the russian language in tesseract for it to support accents. I found this <https://github.com/tesseract-ocr/langdata/tree/main/rus_accent> folder in langdata, but can't find a way to use it to re-train the russian language. How can i use the rus_accent folder and its files to easily re-train the russian language ? I hope my explanation was clear enough. (Sorry if i made some grammatical or some other english mistakes, english is not my native language). Le mardi 6 février 2024 à 18:51:01 UTC+1, zdenop a écrit : > You are referring old issue... > You either provide steps to replicate your problem (including input image) > or you have to solve it by yourself. > > Zdenko > > > po 5. 2. 2024 o 9:53 Romain B. (Le Belge) <romainbar...@gmail.com> > napísal(a): > >> Hi, >> <https://stackoverflow.com/posts/77897165/timeline> >> >> I saw that tesseract make the mistakes of turning russian vowels with >> accents(ò,à,...)(used for educational purposes most of the time) into other >> russian letters, and saw that someone, with the same problem >> <https://github.com/tesseract-ocr/langdata/pull/12>, had created trained >> data(if i understood correctly) for russian with accents >> <https://github.com/tesseract-ocr/langdata/tree/main/rus_accent> >> >> The problem is, i can not find a way to make it a traineddata file, to >> test it and later use it in my code. I found the tesstrain >> <https://github.com/tesseract-ocr/tesstrain> git, but was not able to >> make it work with the data found. >> >> I honestly don't know if I am missing something, not understanding >> correctly something, or if we simply don't train data with these types of >> files anymore. >> >> If you got any clue, that would help me a lot. >> >> Thank you! >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/201355ba-dafd-49fd-b0a7-3b42fd8175d8n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/201355ba-dafd-49fd-b0a7-3b42fd8175d8n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/72616f7b-fff4-46ca-8c00-0d186dbdde06n%40googlegroups.com.