For training from scratch a large training text and hundreds of thousands of iterations are recommended.
If you are just fine tuning for a font try to follow instructions for training for impact, with your font. On Thu, 30 May 2019, 06:05 ElGato ElMago, <elmagoelg...@gmail.com> wrote: > Thanks, Shree. > > Yes, I saw the instruction. The steps I made are as follows: > > Using tesstrain.sh: > src/training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng > --linedata_only \ > --noextract_font_properties --langdata_dir ../langdata \ > --tessdata_dir ./tessdata \ > --fontlist "E13Bnsd" --output_dir ~/tesstutorial/e13beval \ > --training_text ../langdata/eng/eng.training_e13b_text > > Training from scratch: > mkdir -p ~/tesstutorial/e13boutput > src/training/lstmtraining --debug_interval 100 \ > --traineddata ~/tesstutorial/e13beval/eng/eng.traineddata \ > --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' \ > --model_output ~/tesstutorial/e13boutput/base --learning_rate 20e-4 \ > --train_listfile ~/tesstutorial/e13beval/eng.training_files.txt \ > --eval_listfile ~/tesstutorial/e13beval/eng.training_files.txt \ > --max_iterations 5000 &>~/tesstutorial/e13boutput/basetrain.log > > Test with base_checkpoint: > src/training/lstmeval --model ~/tesstutorial/e13boutput/base_checkpoint \ > --traineddata ~/tesstutorial/e13beval/eng/eng.traineddata \ > --eval_listfile ~/tesstutorial/e13beval/eng.training_files.txt > > Combining output files: > src/training/lstmtraining --stop_training \ > --continue_from ~/tesstutorial/e13boutput/base_checkpoint \ > --traineddata ~/tesstutorial/e13beval/eng/eng.traineddata \ > --model_output ~/tesstutorial/e13boutput/eng.traineddata > > Test with eng.traineddata: > tesseract e13b.png out --tessdata-dir /home/koichi/tesstutorial/e13boutput > > > The training from scratch ended as: > > At iteration 561/2500/2500, Mean rms=0.159%, delta=0%, char train=0%, word > train=0%, skip ratio=0%, New best char error = 0 wrote best > model:/home/koichi/tesstutorial/e13boutput/base0_561.checkpoint wrote > checkpoint. > > > The test with base_checkpoint returns nothing as: > > At iteration 0, stage 0, Eval Char error rate=0, Word error rate=0 > > > The test with eng.traineddata and e13b.png returns out.txt. Both files > are attached. > > Training seems to have worked fine. I don't know how to translate the > test result from base_checkpoint. The generated eng.traineddata obviously > doesn't work well. I suspect the choice of --traineddata in combining > output files is bad but I have no clue. > > Regards, > ElMagoElGato > > BTW, I referred to your tess4training in the process. It helped a lot. > > 2019年5月29日水曜日 19時14分08秒 UTC+9 shree: >> >> see >> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#combining-the-output-files >> >> On Wed, May 29, 2019 at 3:18 PM ElGato ElMago <elmago...@gmail.com> >> wrote: >> >>> Hi, >>> >>> I wish to make a trained data for E13B font. >>> >>> I read the training tutorial and made a base_checkpoint file according >>> to the method in Training From Scratch. Now, how can I make a trained data >>> from the base_checkpoint file? >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesser...@googlegroups.com. >>> To post to this group, send email to tesser...@googlegroups.com. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/4848cfa5-ae2b-4be3-a771-686aa0fec702%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/4848cfa5-ae2b-4be3-a771-686aa0fec702%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> >> -- >> >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/7f29f47e-c6f5-4743-832d-94e7d28ab4e8%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/7f29f47e-c6f5-4743-832d-94e7d28ab4e8%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXBZfsQRM3nx5Pgr%3DkkS%2Bk-nsFgXC-guAk95eDh2D8sUg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.