Hi Shree Thanks for replying

For tesseract *3.05.00*

I had already checked that link there they mentioned
*"Make sure there are a minimum number of samples of each character. 10 is
good, but 5 is OK for rare characters.*
*There should be more samples of the more frequent characters - at least
20.*
*Don't make the mistake of grouping all the non-letters together. Make the
text more realistic"*

Does it holds for langdatat eng.training_text if yes  Then that means they
are generating it randomly . How randomly generated training text can
assure accuracy.
Also they have mentioned each character should have minimum sample of 10 ,
why so , where in code this criteria is used . I have checked code but
could not find this criteria anywhere . Is it related to algorithm ? then
which one adaptive of shape classifier or related to bounding box
coordinates .

Please clear my doubts and if required please pull Ray or someone from dev
team as well as i have doubts regarding tesseract code as well.
I could not post in tesseract-dev forum because doubts should be asked in
tesseract =user list only

Then how can i have tesseract developer answer my question. Please tell me
the way

Thanks again for your timely reply and help .




On Sat, Apr 7, 2018 at 6:21 PM, ShreeDevi Kumar <shreesh...@gmail.com>
wrote:

> see  https://github.com/tesseract-ocr/tesseract/wiki/
> Training-Tesseract-3.03%E2%80%933.05
>
> ShreeDevi
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Sat, Apr 7, 2018 at 4:02 PM, Romil Mehla <meh...@gmail.com> wrote:
>
>> Thanks for your reply , i have read about tesseract 4.0 and Ray mentioned
>> how he used so many files to train tesseract 4.0 but i dont want to use
>> tesseract 4.0 , i wanted to know about tesseract 3.05.00 , from my
>> understanding suppose for eng languaur . eng.training_text file is build
>> from eng.wordlist  file mentioned in langdata. For a new language how can i
>> build training text from my new languaue wordlist ,any idea on who has
>> created the eng.training_text  file ? is there any rule or algorithm to do
>> so , or it is randomly generated from eng.wordlist by maintaining minimum
>> 10 times occurrence of a character in training text.
>>
>>
>>
>> Please clarify on this , please let me know how to generate traning_text??
>>
>> On Saturday, April 7, 2018 at 3:46:10 PM UTC+5:30, shree wrote:
>>>
>>> Just a word list is not enough for training text.
>>>
>>> For tesseract 4.0.0 it needs to be representative of the text to be
>>> recognized.
>>>
>>> On Sat 7 Apr, 2018, 2:50 PM Romil Mehla, <meh...@gmail.com> wrote:
>>>
>>>> Is there any program to generate it ?  i see ambiguous_words.cpp
>>>> generating dictionary words and ambiguous words where is it used ? or it
>>>> can be used to build unicharambigs file to generate rules ?
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to tesseract-oc...@googlegroups.com.
>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>>> gid/tesseract-ocr/2ce880b4-b750-4be9-a1a0-01f832f679df%40goo
>>>> glegroups.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/2ce880b4-b750-4be9-a1a0-01f832f679df%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit https://groups.google.com/d/ms
>> gid/tesseract-ocr/fcfdc967-121e-480a-a0fe-e57f341115c7%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/fcfdc967-121e-480a-a0fe-e57f341115c7%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/CAG2NduWcHvQfqitW37fh-tVk9GsfZq9Byc%3Dmv_cGM2Uipwp%
> 2B5w%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWcHvQfqitW37fh-tVk9GsfZq9Byc%3Dmv_cGM2Uipwp%2B5w%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAKLV5Psfa-y_ZXE-%2BJf%2BUVtPbicCdzkfVB6cHBfEnw8j%2ByLyqA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to