Thanks for your reply , i have read about tesseract 4.0 and Ray mentioned how he used so many files to train tesseract 4.0 but i dont want to use tesseract 4.0 , i wanted to know about tesseract 3.05.00 , from my understanding suppose for eng languaur . eng.training_text file is build from eng.wordlist file mentioned in langdata. For a new language how can i build training text from my new languaue wordlist ,any idea on who has created the eng.training_text file ? is there any rule or algorithm to do so , or it is randomly generated from eng.wordlist by maintaining minimum 10 times occurrence of a character in training text.
Please clarify on this , please let me know how to generate traning_text?? On Saturday, April 7, 2018 at 3:46:10 PM UTC+5:30, shree wrote: > > Just a word list is not enough for training text. > > For tesseract 4.0.0 it needs to be representative of the text to be > recognized. > > On Sat 7 Apr, 2018, 2:50 PM Romil Mehla, <meh...@gmail.com <javascript:>> > wrote: > >> Is there any program to generate it ? i see ambiguous_words.cpp >> generating dictionary words and ambiguous words where is it used ? or it >> can be used to build unicharambigs file to generate rules ? >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com <javascript:>. >> To post to this group, send email to tesser...@googlegroups.com >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/2ce880b4-b750-4be9-a1a0-01f832f679df%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/2ce880b4-b750-4be9-a1a0-01f832f679df%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/fcfdc967-121e-480a-a0fe-e57f341115c7%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.