Thanks a lot, Shree. I'll look it in.

2019年5月30日木曜日 14時39分52秒 UTC+9 shree:
>
> See https://github.com/Shreeshrii/tessdata_shreetest
>
> Look at the files engrestrict*.* and also 
> https://github.com/Shreeshrii/tessdata_shreetest/blob/master/eng.digits.training_text
>
> Create training text of about 100 lines and finetune for 400 lines 
>
>
>
> On Thu, May 30, 2019 at 9:38 AM ElGato ElMago <elmago...@gmail.com 
> <javascript:>> wrote:
>
>> I had about 14 lines as attached.  How many lines would you recommend?
>>
>> Fine tuning gives much better result but it tends to pick other character 
>> than in E13B that only has 14 characters, 0 through 9 and 4 symbols.  I 
>> thought training from scratch would eliminate such confusion.
>>
>> 2019年5月30日木曜日 10時43分08秒 UTC+9 shree:
>>>
>>> For training from scratch a large training text and hundreds of 
>>> thousands of iterations are recommended. 
>>>
>>> If you are just fine tuning for a font try to follow instructions for 
>>> training for impact, with your font.
>>>
>>>
>>> On Thu, 30 May 2019, 06:05 ElGato ElMago, <elmago...@gmail.com> wrote:
>>>
>>>> Thanks, Shree.
>>>>
>>>> Yes, I saw the instruction.  The steps I made are as follows:
>>>>
>>>> Using tesstrain.sh:
>>>> src/training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng 
>>>> --linedata_only \
>>>>   --noextract_font_properties --langdata_dir ../langdata \
>>>>   --tessdata_dir ./tessdata \
>>>>   --fontlist "E13Bnsd" --output_dir ~/tesstutorial/e13beval \
>>>>   --training_text ../langdata/eng/eng.training_e13b_text
>>>>
>>>> Training from scratch:
>>>> mkdir -p ~/tesstutorial/e13boutput
>>>> src/training/lstmtraining --debug_interval 100 \
>>>>   --traineddata ~/tesstutorial/e13beval/eng/eng.traineddata \
>>>>   --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 
>>>> O1c111]' \
>>>>   --model_output ~/tesstutorial/e13boutput/base --learning_rate 20e-4 \
>>>>   --train_listfile ~/tesstutorial/e13beval/eng.training_files.txt \
>>>>   --eval_listfile ~/tesstutorial/e13beval/eng.training_files.txt \
>>>>   --max_iterations 5000 &>~/tesstutorial/e13boutput/basetrain.log
>>>>
>>>> Test with base_checkpoint:
>>>> src/training/lstmeval --model ~/tesstutorial/e13boutput/base_checkpoint 
>>>> \
>>>>   --traineddata ~/tesstutorial/e13beval/eng/eng.traineddata \
>>>>   --eval_listfile ~/tesstutorial/e13beval/eng.training_files.txt
>>>>
>>>> Combining output files:
>>>> src/training/lstmtraining --stop_training \
>>>>   --continue_from ~/tesstutorial/e13boutput/base_checkpoint \
>>>>   --traineddata ~/tesstutorial/e13beval/eng/eng.traineddata \
>>>>   --model_output ~/tesstutorial/e13boutput/eng.traineddata
>>>>
>>>> Test with eng.traineddata:
>>>> tesseract e13b.png out --tessdata-dir 
>>>> /home/koichi/tesstutorial/e13boutput
>>>>
>>>>
>>>> The training from scratch ended as:
>>>>
>>>> At iteration 561/2500/2500, Mean rms=0.159%, delta=0%, char train=0%, 
>>>> word train=0%, skip ratio=0%,  New best char error = 0 wrote best 
>>>> model:/home/koichi/tesstutorial/e13boutput/base0_561.checkpoint wrote 
>>>> checkpoint.
>>>>
>>>>
>>>> The test with base_checkpoint returns nothing as:
>>>>
>>>> At iteration 0, stage 0, Eval Char error rate=0, Word error rate=0
>>>>
>>>>
>>>> The test with eng.traineddata and e13b.png returns out.txt.  Both files 
>>>> are attached.
>>>>
>>>> Training seems to have worked fine.  I don't know how to translate the 
>>>> test result from base_checkpoint.  The generated eng.traineddata obviously 
>>>> doesn't work well. I suspect the choice of --traineddata in combining 
>>>> output files is bad but I have no clue.
>>>>
>>>> Regards,
>>>> ElMagoElGato
>>>>
>>>> BTW, I referred to your tess4training in the process.  It helped a lot.
>>>>
>>>> 2019年5月29日水曜日 19時14分08秒 UTC+9 shree:
>>>>>
>>>>> see 
>>>>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#combining-the-output-files
>>>>>
>>>>> On Wed, May 29, 2019 at 3:18 PM ElGato ElMago <elmago...@gmail.com> 
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I wish to make a trained data for E13B font.
>>>>>>
>>>>>> I read the training tutorial and made a base_checkpoint file 
>>>>>> according to the method in Training From Scratch.  Now, how can I make a 
>>>>>> trained data from the base_checkpoint file?
>>>>>>
>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to tesser...@googlegroups.com.
>>>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/4848cfa5-ae2b-4be3-a771-686aa0fec702%40googlegroups.com
>>>>>>  
>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/4848cfa5-ae2b-4be3-a771-686aa0fec702%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>>
>>>>> ____________________________________________________________
>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to tesser...@googlegroups.com.
>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/7f29f47e-c6f5-4743-832d-94e7d28ab4e8%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/7f29f47e-c6f5-4743-832d-94e7d28ab4e8%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesser...@googlegroups.com <javascript:>.
>> To post to this group, send email to tesser...@googlegroups.com 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/2c6fe865-911d-41f3-9926-cbfb56db794f%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/2c6fe865-911d-41f3-9926-cbfb56db794f%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> -- 
>
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/1f070094-8982-46ce-837b-0ef03c39e14a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to