See https://github.com/Shreeshrii/tessdata_shreetest

Look at the files engrestrict*.* and also
https://github.com/Shreeshrii/tessdata_shreetest/blob/master/eng.digits.training_text

Create training text of about 100 lines and finetune for 400 lines



On Thu, May 30, 2019 at 9:38 AM ElGato ElMago <elmagoelg...@gmail.com>
wrote:

> I had about 14 lines as attached.  How many lines would you recommend?
>
> Fine tuning gives much better result but it tends to pick other character
> than in E13B that only has 14 characters, 0 through 9 and 4 symbols.  I
> thought training from scratch would eliminate such confusion.
>
> 2019年5月30日木曜日 10時43分08秒 UTC+9 shree:
>>
>> For training from scratch a large training text and hundreds of thousands
>> of iterations are recommended.
>>
>> If you are just fine tuning for a font try to follow instructions for
>> training for impact, with your font.
>>
>>
>> On Thu, 30 May 2019, 06:05 ElGato ElMago, <elmago...@gmail.com> wrote:
>>
>>> Thanks, Shree.
>>>
>>> Yes, I saw the instruction.  The steps I made are as follows:
>>>
>>> Using tesstrain.sh:
>>> src/training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng
>>> --linedata_only \
>>>   --noextract_font_properties --langdata_dir ../langdata \
>>>   --tessdata_dir ./tessdata \
>>>   --fontlist "E13Bnsd" --output_dir ~/tesstutorial/e13beval \
>>>   --training_text ../langdata/eng/eng.training_e13b_text
>>>
>>> Training from scratch:
>>> mkdir -p ~/tesstutorial/e13boutput
>>> src/training/lstmtraining --debug_interval 100 \
>>>   --traineddata ~/tesstutorial/e13beval/eng/eng.traineddata \
>>>   --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256
>>> O1c111]' \
>>>   --model_output ~/tesstutorial/e13boutput/base --learning_rate 20e-4 \
>>>   --train_listfile ~/tesstutorial/e13beval/eng.training_files.txt \
>>>   --eval_listfile ~/tesstutorial/e13beval/eng.training_files.txt \
>>>   --max_iterations 5000 &>~/tesstutorial/e13boutput/basetrain.log
>>>
>>> Test with base_checkpoint:
>>> src/training/lstmeval --model ~/tesstutorial/e13boutput/base_checkpoint \
>>>   --traineddata ~/tesstutorial/e13beval/eng/eng.traineddata \
>>>   --eval_listfile ~/tesstutorial/e13beval/eng.training_files.txt
>>>
>>> Combining output files:
>>> src/training/lstmtraining --stop_training \
>>>   --continue_from ~/tesstutorial/e13boutput/base_checkpoint \
>>>   --traineddata ~/tesstutorial/e13beval/eng/eng.traineddata \
>>>   --model_output ~/tesstutorial/e13boutput/eng.traineddata
>>>
>>> Test with eng.traineddata:
>>> tesseract e13b.png out --tessdata-dir
>>> /home/koichi/tesstutorial/e13boutput
>>>
>>>
>>> The training from scratch ended as:
>>>
>>> At iteration 561/2500/2500, Mean rms=0.159%, delta=0%, char train=0%,
>>> word train=0%, skip ratio=0%,  New best char error = 0 wrote best
>>> model:/home/koichi/tesstutorial/e13boutput/base0_561.checkpoint wrote
>>> checkpoint.
>>>
>>>
>>> The test with base_checkpoint returns nothing as:
>>>
>>> At iteration 0, stage 0, Eval Char error rate=0, Word error rate=0
>>>
>>>
>>> The test with eng.traineddata and e13b.png returns out.txt.  Both files
>>> are attached.
>>>
>>> Training seems to have worked fine.  I don't know how to translate the
>>> test result from base_checkpoint.  The generated eng.traineddata obviously
>>> doesn't work well. I suspect the choice of --traineddata in combining
>>> output files is bad but I have no clue.
>>>
>>> Regards,
>>> ElMagoElGato
>>>
>>> BTW, I referred to your tess4training in the process.  It helped a lot.
>>>
>>> 2019年5月29日水曜日 19時14分08秒 UTC+9 shree:
>>>>
>>>> see
>>>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#combining-the-output-files
>>>>
>>>> On Wed, May 29, 2019 at 3:18 PM ElGato ElMago <elmago...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I wish to make a trained data for E13B font.
>>>>>
>>>>> I read the training tutorial and made a base_checkpoint file according
>>>>> to the method in Training From Scratch.  Now, how can I make a trained 
>>>>> data
>>>>> from the base_checkpoint file?
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to tesser...@googlegroups.com.
>>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/4848cfa5-ae2b-4be3-a771-686aa0fec702%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/4848cfa5-ae2b-4be3-a771-686aa0fec702%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> ____________________________________________________________
>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesser...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/7f29f47e-c6f5-4743-832d-94e7d28ab4e8%40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/7f29f47e-c6f5-4743-832d-94e7d28ab4e8%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/2c6fe865-911d-41f3-9926-cbfb56db794f%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/2c6fe865-911d-41f3-9926-cbfb56db794f%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>


-- 

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUiAU5D5wfT3q2x_xZqVWCwG65VJpnnXhZ8i4P71YGb7w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to