See https://github.com/Shreeshrii/tessdata_shreetest
Look at the files engrestrict*.* and also https://github.com/Shreeshrii/tessdata_shreetest/blob/master/eng.digits.training_text Create training text of about 100 lines and finetune for 400 lines On Thu, May 30, 2019 at 9:38 AM ElGato ElMago <elmagoelg...@gmail.com> wrote: > I had about 14 lines as attached. How many lines would you recommend? > > Fine tuning gives much better result but it tends to pick other character > than in E13B that only has 14 characters, 0 through 9 and 4 symbols. I > thought training from scratch would eliminate such confusion. > > 2019年5月30日木曜日 10時43分08秒 UTC+9 shree: >> >> For training from scratch a large training text and hundreds of thousands >> of iterations are recommended. >> >> If you are just fine tuning for a font try to follow instructions for >> training for impact, with your font. >> >> >> On Thu, 30 May 2019, 06:05 ElGato ElMago, <elmago...@gmail.com> wrote: >> >>> Thanks, Shree. >>> >>> Yes, I saw the instruction. The steps I made are as follows: >>> >>> Using tesstrain.sh: >>> src/training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng >>> --linedata_only \ >>> --noextract_font_properties --langdata_dir ../langdata \ >>> --tessdata_dir ./tessdata \ >>> --fontlist "E13Bnsd" --output_dir ~/tesstutorial/e13beval \ >>> --training_text ../langdata/eng/eng.training_e13b_text >>> >>> Training from scratch: >>> mkdir -p ~/tesstutorial/e13boutput >>> src/training/lstmtraining --debug_interval 100 \ >>> --traineddata ~/tesstutorial/e13beval/eng/eng.traineddata \ >>> --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 >>> O1c111]' \ >>> --model_output ~/tesstutorial/e13boutput/base --learning_rate 20e-4 \ >>> --train_listfile ~/tesstutorial/e13beval/eng.training_files.txt \ >>> --eval_listfile ~/tesstutorial/e13beval/eng.training_files.txt \ >>> --max_iterations 5000 &>~/tesstutorial/e13boutput/basetrain.log >>> >>> Test with base_checkpoint: >>> src/training/lstmeval --model ~/tesstutorial/e13boutput/base_checkpoint \ >>> --traineddata ~/tesstutorial/e13beval/eng/eng.traineddata \ >>> --eval_listfile ~/tesstutorial/e13beval/eng.training_files.txt >>> >>> Combining output files: >>> src/training/lstmtraining --stop_training \ >>> --continue_from ~/tesstutorial/e13boutput/base_checkpoint \ >>> --traineddata ~/tesstutorial/e13beval/eng/eng.traineddata \ >>> --model_output ~/tesstutorial/e13boutput/eng.traineddata >>> >>> Test with eng.traineddata: >>> tesseract e13b.png out --tessdata-dir >>> /home/koichi/tesstutorial/e13boutput >>> >>> >>> The training from scratch ended as: >>> >>> At iteration 561/2500/2500, Mean rms=0.159%, delta=0%, char train=0%, >>> word train=0%, skip ratio=0%, New best char error = 0 wrote best >>> model:/home/koichi/tesstutorial/e13boutput/base0_561.checkpoint wrote >>> checkpoint. >>> >>> >>> The test with base_checkpoint returns nothing as: >>> >>> At iteration 0, stage 0, Eval Char error rate=0, Word error rate=0 >>> >>> >>> The test with eng.traineddata and e13b.png returns out.txt. Both files >>> are attached. >>> >>> Training seems to have worked fine. I don't know how to translate the >>> test result from base_checkpoint. The generated eng.traineddata obviously >>> doesn't work well. I suspect the choice of --traineddata in combining >>> output files is bad but I have no clue. >>> >>> Regards, >>> ElMagoElGato >>> >>> BTW, I referred to your tess4training in the process. It helped a lot. >>> >>> 2019年5月29日水曜日 19時14分08秒 UTC+9 shree: >>>> >>>> see >>>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#combining-the-output-files >>>> >>>> On Wed, May 29, 2019 at 3:18 PM ElGato ElMago <elmago...@gmail.com> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> I wish to make a trained data for E13B font. >>>>> >>>>> I read the training tutorial and made a base_checkpoint file according >>>>> to the method in Training From Scratch. Now, how can I make a trained >>>>> data >>>>> from the base_checkpoint file? >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to tesser...@googlegroups.com. >>>>> To post to this group, send email to tesser...@googlegroups.com. >>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/4848cfa5-ae2b-4be3-a771-686aa0fec702%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/4848cfa5-ae2b-4be3-a771-686aa0fec702%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> >>>> -- >>>> >>>> ____________________________________________________________ >>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesser...@googlegroups.com. >>> To post to this group, send email to tesser...@googlegroups.com. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/7f29f47e-c6f5-4743-832d-94e7d28ab4e8%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/7f29f47e-c6f5-4743-832d-94e7d28ab4e8%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/2c6fe865-911d-41f3-9926-cbfb56db794f%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/2c6fe865-911d-41f3-9926-cbfb56db794f%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUiAU5D5wfT3q2x_xZqVWCwG65VJpnnXhZ8i4P71YGb7w%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.