Hi Shree Thanks for replying For tesseract *3.05.00*
I had already checked that link there they mentioned *"Make sure there are a minimum number of samples of each character. 10 is good, but 5 is OK for rare characters.* *There should be more samples of the more frequent characters - at least 20.* *Don't make the mistake of grouping all the non-letters together. Make the text more realistic"* Does it holds for langdatat eng.training_text if yes Then that means they are generating it randomly . How randomly generated training text can assure accuracy. Also they have mentioned each character should have minimum sample of 10 , why so , where in code this criteria is used . I have checked code but could not find this criteria anywhere . Is it related to algorithm ? then which one adaptive of shape classifier or related to bounding box coordinates . Please clear my doubts and if required please pull Ray or someone from dev team as well as i have doubts regarding tesseract code as well. I could not post in tesseract-dev forum because doubts should be asked in tesseract =user list only Then how can i have tesseract developer answer my question. Please tell me the way Thanks again for your timely reply and help . On Sat, Apr 7, 2018 at 6:21 PM, ShreeDevi Kumar <shreesh...@gmail.com> wrote: > see https://github.com/tesseract-ocr/tesseract/wiki/ > Training-Tesseract-3.03%E2%80%933.05 > > ShreeDevi > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Sat, Apr 7, 2018 at 4:02 PM, Romil Mehla <meh...@gmail.com> wrote: > >> Thanks for your reply , i have read about tesseract 4.0 and Ray mentioned >> how he used so many files to train tesseract 4.0 but i dont want to use >> tesseract 4.0 , i wanted to know about tesseract 3.05.00 , from my >> understanding suppose for eng languaur . eng.training_text file is build >> from eng.wordlist file mentioned in langdata. For a new language how can i >> build training text from my new languaue wordlist ,any idea on who has >> created the eng.training_text file ? is there any rule or algorithm to do >> so , or it is randomly generated from eng.wordlist by maintaining minimum >> 10 times occurrence of a character in training text. >> >> >> >> Please clarify on this , please let me know how to generate traning_text?? >> >> On Saturday, April 7, 2018 at 3:46:10 PM UTC+5:30, shree wrote: >>> >>> Just a word list is not enough for training text. >>> >>> For tesseract 4.0.0 it needs to be representative of the text to be >>> recognized. >>> >>> On Sat 7 Apr, 2018, 2:50 PM Romil Mehla, <meh...@gmail.com> wrote: >>> >>>> Is there any program to generate it ? i see ambiguous_words.cpp >>>> generating dictionary words and ambiguous words where is it used ? or it >>>> can be used to build unicharambigs file to generate rules ? >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to tesseract-oc...@googlegroups.com. >>>> To post to this group, send email to tesser...@googlegroups.com. >>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit https://groups.google.com/d/ms >>>> gid/tesseract-ocr/2ce880b4-b750-4be9-a1a0-01f832f679df%40goo >>>> glegroups.com >>>> <https://groups.google.com/d/msgid/tesseract-ocr/2ce880b4-b750-4be9-a1a0-01f832f679df%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-ocr+unsubscr...@googlegroups.com. >> To post to this group, send email to tesseract-ocr@googlegroups.com. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit https://groups.google.com/d/ms >> gid/tesseract-ocr/fcfdc967-121e-480a-a0fe-e57f341115c7%40googlegroups.com >> <https://groups.google.com/d/msgid/tesseract-ocr/fcfdc967-121e-480a-a0fe-e57f341115c7%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/CAG2NduWcHvQfqitW37fh-tVk9GsfZq9Byc%3Dmv_cGM2Uipwp% > 2B5w%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWcHvQfqitW37fh-tVk9GsfZq9Byc%3Dmv_cGM2Uipwp%2B5w%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAKLV5Psfa-y_ZXE-%2BJf%2BUVtPbicCdzkfVB6cHBfEnw8j%2ByLyqA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.