Thanks, Shree.

Yes, I saw the instruction.  The steps I made are as follows:

Using tesstrain.sh:
src/training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng 
--linedata_only \
  --noextract_font_properties --langdata_dir ../langdata \
  --tessdata_dir ./tessdata \
  --fontlist "E13Bnsd" --output_dir ~/tesstutorial/e13beval \
  --training_text ../langdata/eng/eng.training_e13b_text

Training from scratch:
mkdir -p ~/tesstutorial/e13boutput
src/training/lstmtraining --debug_interval 100 \
  --traineddata ~/tesstutorial/e13beval/eng/eng.traineddata \
  --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' \
  --model_output ~/tesstutorial/e13boutput/base --learning_rate 20e-4 \
  --train_listfile ~/tesstutorial/e13beval/eng.training_files.txt \
  --eval_listfile ~/tesstutorial/e13beval/eng.training_files.txt \
  --max_iterations 5000 &>~/tesstutorial/e13boutput/basetrain.log

Test with base_checkpoint:
src/training/lstmeval --model ~/tesstutorial/e13boutput/base_checkpoint \
  --traineddata ~/tesstutorial/e13beval/eng/eng.traineddata \
  --eval_listfile ~/tesstutorial/e13beval/eng.training_files.txt

Combining output files:
src/training/lstmtraining --stop_training \
  --continue_from ~/tesstutorial/e13boutput/base_checkpoint \
  --traineddata ~/tesstutorial/e13beval/eng/eng.traineddata \
  --model_output ~/tesstutorial/e13boutput/eng.traineddata

Test with eng.traineddata:
tesseract e13b.png out --tessdata-dir /home/koichi/tesstutorial/e13boutput


The training from scratch ended as:

At iteration 561/2500/2500, Mean rms=0.159%, delta=0%, char train=0%, word 
train=0%, skip ratio=0%,  New best char error = 0 wrote best 
model:/home/koichi/tesstutorial/e13boutput/base0_561.checkpoint wrote 
checkpoint.


The test with base_checkpoint returns nothing as:

At iteration 0, stage 0, Eval Char error rate=0, Word error rate=0


The test with eng.traineddata and e13b.png returns out.txt.  Both files are 
attached.

Training seems to have worked fine.  I don't know how to translate the test 
result from base_checkpoint.  The generated eng.traineddata obviously 
doesn't work well. I suspect the choice of --traineddata in combining 
output files is bad but I have no clue.

Regards,
ElMagoElGato

BTW, I referred to your tess4training in the process.  It helped a lot.

2019年5月29日水曜日 19時14分08秒 UTC+9 shree:
>
> see 
> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#combining-the-output-files
>
> On Wed, May 29, 2019 at 3:18 PM ElGato ElMago <elmago...@gmail.com 
> <javascript:>> wrote:
>
>> Hi,
>>
>> I wish to make a trained data for E13B font.
>>
>> I read the training tutorial and made a base_checkpoint file according to 
>> the method in Training From Scratch.  Now, how can I make a trained data 
>> from the base_checkpoint file?
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesser...@googlegroups.com <javascript:>.
>> To post to this group, send email to tesser...@googlegroups.com 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/4848cfa5-ae2b-4be3-a771-686aa0fec702%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/4848cfa5-ae2b-4be3-a771-686aa0fec702%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> -- 
>
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/7f29f47e-c6f5-4743-832d-94e7d28ab4e8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
 

8


Reply via email to