Thanks Shree , but if tesseract is open source then why developers can't 
answer doubts , If i were to randomly train my model how can i come down to 
accurate accuracy of my model , then my model accuracy will also be random. 

I want the reason for condition imposed on training text , how much it will 
impact my accuracy , is there any other way in which i can increase my 
model's accuracy by my own knowing these answer so that my random training 
does not give me a random model.





 

On Monday, April 9, 2018 at 3:19:55 PM UTC+5:30, shree wrote:
>
> For tesseract 3.05
>
> random text will work, it is suggested to use combos similar to English 
> training text.
>
> It is unlikely you will get answers to your questions from the developers. 
> You can search past issues/questions in forum and github.
>
> 3.05 training does not take long, run a few experiments for your 
> 'language' and test.
>
> ShreeDevi
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Mon, Apr 9, 2018 at 2:15 PM, Romil Mehla <meh...@gmail.com 
> <javascript:>> wrote:
>
>> Hi Shree Thanks for replying
>>
>> For tesseract *3.05.00*
>>
>> I had already checked that link there they mentioned 
>> *"Make sure there are a minimum number of samples of each character. 10 
>> is good, but 5 is OK for rare characters.*
>> *There should be more samples of the more frequent characters - at least 
>> 20.*
>> *Don't make the mistake of grouping all the non-letters together. Make 
>> the text more realistic"*
>>
>> Does it holds for langdatat eng.training_text if yes  Then that means 
>> they are generating it randomly . How randomly generated training text can 
>> assure accuracy.
>> Also they have mentioned each character should have minimum sample of 10 
>> , why so , where in code this criteria is used . I have checked code but 
>> could not find this criteria anywhere . Is it related to algorithm ? then 
>> which one adaptive of shape classifier or related to bounding box 
>> coordinates .
>>
>> Please clear my doubts and if required please pull Ray or someone from 
>> dev team as well as i have doubts regarding tesseract code as well.
>> I could not post in tesseract-dev forum because doubts should be asked in 
>> tesseract =user list only
>>
>> Then how can i have tesseract developer answer my question. Please tell 
>> me the way
>>
>> Thanks again for your timely reply and help .
>>
>>
>>
>>
>> On Sat, Apr 7, 2018 at 6:21 PM, ShreeDevi Kumar <shree...@gmail.com 
>> <javascript:>> wrote:
>>
>>> see  
>>> https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-3.03%E2%80%933.05
>>>
>>> ShreeDevi
>>> ____________________________________________________________
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>>> On Sat, Apr 7, 2018 at 4:02 PM, Romil Mehla <meh...@gmail.com 
>>> <javascript:>> wrote:
>>>
>>>> Thanks for your reply , i have read about tesseract 4.0 and Ray 
>>>> mentioned how he used so many files to train tesseract 4.0 but i dont want 
>>>> to use tesseract 4.0 , i wanted to know about tesseract 3.05.00 , from my 
>>>> understanding suppose for eng languaur . eng.training_text file is build 
>>>> from eng.wordlist  file mentioned in langdata. For a new language how can 
>>>> i 
>>>> build training text from my new languaue wordlist ,any idea on who has 
>>>> created the eng.training_text  file ? is there any rule or algorithm to do 
>>>> so , or it is randomly generated from eng.wordlist by maintaining minimum 
>>>> 10 times occurrence of a character in training text.
>>>>
>>>>
>>>>
>>>> Please clarify on this , please let me know how to generate 
>>>> traning_text??
>>>>
>>>> On Saturday, April 7, 2018 at 3:46:10 PM UTC+5:30, shree wrote:
>>>>>
>>>>> Just a word list is not enough for training text.
>>>>>
>>>>> For tesseract 4.0.0 it needs to be representative of the text to be 
>>>>> recognized.
>>>>>
>>>>> On Sat 7 Apr, 2018, 2:50 PM Romil Mehla, <meh...@gmail.com> wrote:
>>>>>
>>>>>> Is there any program to generate it ?  i see ambiguous_words.cpp 
>>>>>> generating dictionary words and ambiguous words where is it used ? or it 
>>>>>> can be used to build unicharambigs file to generate rules ?
>>>>>>
>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to tesseract-oc...@googlegroups.com.
>>>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/2ce880b4-b750-4be9-a1a0-01f832f679df%40googlegroups.com
>>>>>>  
>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/2ce880b4-b750-4be9-a1a0-01f832f679df%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to tesseract-oc...@googlegroups.com <javascript:>.
>>>> To post to this group, send email to tesser...@googlegroups.com 
>>>> <javascript:>.
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/fcfdc967-121e-480a-a0fe-e57f341115c7%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/fcfdc967-121e-480a-a0fe-e57f341115c7%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to tesseract-oc...@googlegroups.com <javascript:>.
>>> To post to this group, send email to tesser...@googlegroups.com 
>>> <javascript:>.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWcHvQfqitW37fh-tVk9GsfZq9Byc%3Dmv_cGM2Uipwp%2B5w%40mail.gmail.com
>>>  
>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWcHvQfqitW37fh-tVk9GsfZq9Byc%3Dmv_cGM2Uipwp%2B5w%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com <javascript:>.
>> To post to this group, send email to tesser...@googlegroups.com 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/CAKLV5Psfa-y_ZXE-%2BJf%2BUVtPbicCdzkfVB6cHBfEnw8j%2ByLyqA%40mail.gmail.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/CAKLV5Psfa-y_ZXE-%2BJf%2BUVtPbicCdzkfVB6cHBfEnw8j%2ByLyqA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/47ab9067-6400-46fa-9662-0cdb4f370d4a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to