Re: [tesseract-ocr] Trained data for E13B font

ElGato ElMago Wed, 29 May 2019 21:09:33 -0700

I had about 14 lines as attached.  How many lines would you recommend?

Fine tuning gives much better result but it tends to pick other character 
than in E13B that only has 14 characters, 0 through 9 and 4 symbols.  I 
thought training from scratch would eliminate such confusion.


2019年5月30日木曜日 10時43分08秒 UTC+9 shree:
>
> For training from scratch a large training text and hundreds of thousands 
> of iterations are recommended. 
>
> If you are just fine tuning for a font try to follow instructions for 
> training for impact, with your font.
>
>
> On Thu, 30 May 2019, 06:05 ElGato ElMago, <elmago...@gmail.com 
> <javascript:>> wrote:
>
>> Thanks, Shree.
>>
>> Yes, I saw the instruction.  The steps I made are as follows:
>>
>> Using tesstrain.sh:
>> src/training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng 
>> --linedata_only \
>>   --noextract_font_properties --langdata_dir ../langdata \
>>   --tessdata_dir ./tessdata \
>>   --fontlist "E13Bnsd" --output_dir ~/tesstutorial/e13beval \
>>   --training_text ../langdata/eng/eng.training_e13b_text
>>
>> Training from scratch:
>> mkdir -p ~/tesstutorial/e13boutput
>> src/training/lstmtraining --debug_interval 100 \
>>   --traineddata ~/tesstutorial/e13beval/eng/eng.traineddata \
>>   --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' 
>> \
>>   --model_output ~/tesstutorial/e13boutput/base --learning_rate 20e-4 \
>>   --train_listfile ~/tesstutorial/e13beval/eng.training_files.txt \
>>   --eval_listfile ~/tesstutorial/e13beval/eng.training_files.txt \
>>   --max_iterations 5000 &>~/tesstutorial/e13boutput/basetrain.log
>>
>> Test with base_checkpoint:
>> src/training/lstmeval --model ~/tesstutorial/e13boutput/base_checkpoint \
>>   --traineddata ~/tesstutorial/e13beval/eng/eng.traineddata \
>>   --eval_listfile ~/tesstutorial/e13beval/eng.training_files.txt
>>
>> Combining output files:
>> src/training/lstmtraining --stop_training \
>>   --continue_from ~/tesstutorial/e13boutput/base_checkpoint \
>>   --traineddata ~/tesstutorial/e13beval/eng/eng.traineddata \
>>   --model_output ~/tesstutorial/e13boutput/eng.traineddata
>>
>> Test with eng.traineddata:
>> tesseract e13b.png out --tessdata-dir /home/koichi/tesstutorial/e13boutput
>>
>>
>> The training from scratch ended as:
>>
>> At iteration 561/2500/2500, Mean rms=0.159%, delta=0%, char train=0%, 
>> word train=0%, skip ratio=0%,  New best char error = 0 wrote best 
>> model:/home/koichi/tesstutorial/e13boutput/base0_561.checkpoint wrote 
>> checkpoint.
>>
>>
>> The test with base_checkpoint returns nothing as:
>>
>> At iteration 0, stage 0, Eval Char error rate=0, Word error rate=0
>>
>>
>> The test with eng.traineddata and e13b.png returns out.txt.  Both files 
>> are attached.
>>
>> Training seems to have worked fine.  I don't know how to translate the 
>> test result from base_checkpoint.  The generated eng.traineddata obviously 
>> doesn't work well. I suspect the choice of --traineddata in combining 
>> output files is bad but I have no clue.
>>
>> Regards,
>> ElMagoElGato
>>
>> BTW, I referred to your tess4training in the process.  It helped a lot.
>>
>> 2019年5月29日水曜日 19時14分08秒 UTC+9 shree:
>>>
>>> see 
>>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#combining-the-output-files
>>>
>>> On Wed, May 29, 2019 at 3:18 PM ElGato ElMago <elmago...@gmail.com> 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I wish to make a trained data for E13B font.
>>>>
>>>> I read the training tutorial and made a base_checkpoint file according 
>>>> to the method in Training From Scratch.  Now, how can I make a trained 
>>>> data 
>>>> from the base_checkpoint file?
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to tesser...@googlegroups.com.
>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/4848cfa5-ae2b-4be3-a771-686aa0fec702%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/4848cfa5-ae2b-4be3-a771-686aa0fec702%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>
>>> -- 
>>>
>>> ____________________________________________________________
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesser...@googlegroups.com <javascript:>.
>> To post to this group, send email to tesser...@googlegroups.com 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/7f29f47e-c6f5-4743-832d-94e7d28ab4e8%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/7f29f47e-c6f5-4743-832d-94e7d28ab4e8%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/2c6fe865-911d-41f3-9926-cbfb56db794f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

eng.training_e13b_text
Description: Binary data

Re: [tesseract-ocr] Trained data for E13B font

Reply via email to