Thanks Shree , but if tesseract is open source then why developers can't answer doubts , If i were to randomly train my model how can i come down to accurate accuracy of my model , then my model accuracy will also be random.
I want the reason for condition imposed on training text , how much it will impact my accuracy , is there any other way in which i can increase my model's accuracy by my own knowing these answer so that my random training does not give me a random model. On Monday, April 9, 2018 at 3:19:55 PM UTC+5:30, shree wrote: > > For tesseract 3.05 > > random text will work, it is suggested to use combos similar to English > training text. > > It is unlikely you will get answers to your questions from the developers. > You can search past issues/questions in forum and github. > > 3.05 training does not take long, run a few experiments for your > 'language' and test. > > ShreeDevi > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Mon, Apr 9, 2018 at 2:15 PM, Romil Mehla <meh...@gmail.com > <javascript:>> wrote: > >> Hi Shree Thanks for replying >> >> For tesseract *3.05.00* >> >> I had already checked that link there they mentioned >> *"Make sure there are a minimum number of samples of each character. 10 >> is good, but 5 is OK for rare characters.* >> *There should be more samples of the more frequent characters - at least >> 20.* >> *Don't make the mistake of grouping all the non-letters together. Make >> the text more realistic"* >> >> Does it holds for langdatat eng.training_text if yes Then that means >> they are generating it randomly . How randomly generated training text can >> assure accuracy. >> Also they have mentioned each character should have minimum sample of 10 >> , why so , where in code this criteria is used . I have checked code but >> could not find this criteria anywhere . Is it related to algorithm ? then >> which one adaptive of shape classifier or related to bounding box >> coordinates . >> >> Please clear my doubts and if required please pull Ray or someone from >> dev team as well as i have doubts regarding tesseract code as well. >> I could not post in tesseract-dev forum because doubts should be asked in >> tesseract =user list only >> >> Then how can i have tesseract developer answer my question. Please tell >> me the way >> >> Thanks again for your timely reply and help . >> >> >> >> >> On Sat, Apr 7, 2018 at 6:21 PM, ShreeDevi Kumar <shree...@gmail.com >> <javascript:>> wrote: >> >>> see >>> https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-3.03%E2%80%933.05 >>> >>> ShreeDevi >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >>> On Sat, Apr 7, 2018 at 4:02 PM, Romil Mehla <meh...@gmail.com >>> <javascript:>> wrote: >>> >>>> Thanks for your reply , i have read about tesseract 4.0 and Ray >>>> mentioned how he used so many files to train tesseract 4.0 but i dont want >>>> to use tesseract 4.0 , i wanted to know about tesseract 3.05.00 , from my >>>> understanding suppose for eng languaur . eng.training_text file is build >>>> from eng.wordlist file mentioned in langdata. For a new language how can >>>> i >>>> build training text from my new languaue wordlist ,any idea on who has >>>> created the eng.training_text file ? is there any rule or algorithm to do >>>> so , or it is randomly generated from eng.wordlist by maintaining minimum >>>> 10 times occurrence of a character in training text. >>>> >>>> >>>> >>>> Please clarify on this , please let me know how to generate >>>> traning_text?? >>>> >>>> On Saturday, April 7, 2018 at 3:46:10 PM UTC+5:30, shree wrote: >>>>> >>>>> Just a word list is not enough for training text. >>>>> >>>>> For tesseract 4.0.0 it needs to be representative of the text to be >>>>> recognized. >>>>> >>>>> On Sat 7 Apr, 2018, 2:50 PM Romil Mehla, <meh...@gmail.com> wrote: >>>>> >>>>>> Is there any program to generate it ? i see ambiguous_words.cpp >>>>>> generating dictionary words and ambiguous words where is it used ? or it >>>>>> can be used to build unicharambigs file to generate rules ? >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "tesseract-ocr" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to tesseract-oc...@googlegroups.com. >>>>>> To post to this group, send email to tesser...@googlegroups.com. >>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/tesseract-ocr/2ce880b4-b750-4be9-a1a0-01f832f679df%40googlegroups.com >>>>>> >>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/2ce880b4-b750-4be9-a1a0-01f832f679df%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to tesseract-oc...@googlegroups.com <javascript:>. >>>> To post to this group, send email to tesser...@googlegroups.com >>>> <javascript:>. >>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/fcfdc967-121e-480a-a0fe-e57f341115c7%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/fcfdc967-121e-480a-a0fe-e57f341115c7%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com <javascript:>. >>> To post to this group, send email to tesser...@googlegroups.com >>> <javascript:>. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWcHvQfqitW37fh-tVk9GsfZq9Byc%3Dmv_cGM2Uipwp%2B5w%40mail.gmail.com >>> >>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWcHvQfqitW37fh-tVk9GsfZq9Byc%3Dmv_cGM2Uipwp%2B5w%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com <javascript:>. >> To post to this group, send email to tesser...@googlegroups.com >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/CAKLV5Psfa-y_ZXE-%2BJf%2BUVtPbicCdzkfVB6cHBfEnw8j%2ByLyqA%40mail.gmail.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/CAKLV5Psfa-y_ZXE-%2BJf%2BUVtPbicCdzkfVB6cHBfEnw8j%2ByLyqA%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/47ab9067-6400-46fa-9662-0cdb4f370d4a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.