Re: [tesseract-ocr] Trained data for E13B font

Shree Devi Kumar Wed, 29 May 2019 18:43:27 -0700

For training from scratch a large training text and hundreds of thousands
of iterations are recommended.


If you are just fine tuning for a font try to follow instructions for
training for impact, with your font.


On Thu, 30 May 2019, 06:05 ElGato ElMago, <elmagoelg...@gmail.com> wrote:

> Thanks, Shree.
>
> Yes, I saw the instruction.  The steps I made are as follows:
>
> Using tesstrain.sh:
> src/training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng
> --linedata_only \
>   --noextract_font_properties --langdata_dir ../langdata \
>   --tessdata_dir ./tessdata \
>   --fontlist "E13Bnsd" --output_dir ~/tesstutorial/e13beval \
>   --training_text ../langdata/eng/eng.training_e13b_text
>
> Training from scratch:
> mkdir -p ~/tesstutorial/e13boutput
> src/training/lstmtraining --debug_interval 100 \
>   --traineddata ~/tesstutorial/e13beval/eng/eng.traineddata \
>   --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' \
>   --model_output ~/tesstutorial/e13boutput/base --learning_rate 20e-4 \
>   --train_listfile ~/tesstutorial/e13beval/eng.training_files.txt \
>   --eval_listfile ~/tesstutorial/e13beval/eng.training_files.txt \
>   --max_iterations 5000 &>~/tesstutorial/e13boutput/basetrain.log
>
> Test with base_checkpoint:
> src/training/lstmeval --model ~/tesstutorial/e13boutput/base_checkpoint \
>   --traineddata ~/tesstutorial/e13beval/eng/eng.traineddata \
>   --eval_listfile ~/tesstutorial/e13beval/eng.training_files.txt
>
> Combining output files:
> src/training/lstmtraining --stop_training \
>   --continue_from ~/tesstutorial/e13boutput/base_checkpoint \
>   --traineddata ~/tesstutorial/e13beval/eng/eng.traineddata \
>   --model_output ~/tesstutorial/e13boutput/eng.traineddata
>
> Test with eng.traineddata:
> tesseract e13b.png out --tessdata-dir /home/koichi/tesstutorial/e13boutput
>
>
> The training from scratch ended as:
>
> At iteration 561/2500/2500, Mean rms=0.159%, delta=0%, char train=0%, word
> train=0%, skip ratio=0%,  New best char error = 0 wrote best
> model:/home/koichi/tesstutorial/e13boutput/base0_561.checkpoint wrote
> checkpoint.
>
>
> The test with base_checkpoint returns nothing as:
>
> At iteration 0, stage 0, Eval Char error rate=0, Word error rate=0
>
>
> The test with eng.traineddata and e13b.png returns out.txt.  Both files
> are attached.
>
> Training seems to have worked fine.  I don't know how to translate the
> test result from base_checkpoint.  The generated eng.traineddata obviously
> doesn't work well. I suspect the choice of --traineddata in combining
> output files is bad but I have no clue.
>
> Regards,
> ElMagoElGato
>
> BTW, I referred to your tess4training in the process.  It helped a lot.
>
> 2019年5月29日水曜日 19時14分08秒 UTC+9 shree:
>>
>> see
>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#combining-the-output-files
>>
>> On Wed, May 29, 2019 at 3:18 PM ElGato ElMago <elmago...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I wish to make a trained data for E13B font.
>>>
>>> I read the training tutorial and made a base_checkpoint file according
>>> to the method in Training From Scratch.  Now, how can I make a trained data
>>> from the base_checkpoint file?
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesser...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/4848cfa5-ae2b-4be3-a771-686aa0fec702%40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/4848cfa5-ae2b-4be3-a771-686aa0fec702%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>> --
>>
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/7f29f47e-c6f5-4743-832d-94e7d28ab4e8%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/7f29f47e-c6f5-4743-832d-94e7d28ab4e8%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXBZfsQRM3nx5Pgr%3DkkS%2Bk-nsFgXC-guAk95eDh2D8sUg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Trained data for E13B font

Reply via email to