Thanks a lot, Shree. I'll look it in. 2019年5月30日木曜日 14時39分52秒 UTC+9 shree: > > See https://github.com/Shreeshrii/tessdata_shreetest > > Look at the files engrestrict*.* and also > https://github.com/Shreeshrii/tessdata_shreetest/blob/master/eng.digits.training_text > > Create training text of about 100 lines and finetune for 400 lines > > > > On Thu, May 30, 2019 at 9:38 AM ElGato ElMago <elmago...@gmail.com > <javascript:>> wrote: > >> I had about 14 lines as attached. How many lines would you recommend? >> >> Fine tuning gives much better result but it tends to pick other character >> than in E13B that only has 14 characters, 0 through 9 and 4 symbols. I >> thought training from scratch would eliminate such confusion. >> >> 2019年5月30日木曜日 10時43分08秒 UTC+9 shree: >>> >>> For training from scratch a large training text and hundreds of >>> thousands of iterations are recommended. >>> >>> If you are just fine tuning for a font try to follow instructions for >>> training for impact, with your font. >>> >>> >>> On Thu, 30 May 2019, 06:05 ElGato ElMago, <elmago...@gmail.com> wrote: >>> >>>> Thanks, Shree. >>>> >>>> Yes, I saw the instruction. The steps I made are as follows: >>>> >>>> Using tesstrain.sh: >>>> src/training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng >>>> --linedata_only \ >>>> --noextract_font_properties --langdata_dir ../langdata \ >>>> --tessdata_dir ./tessdata \ >>>> --fontlist "E13Bnsd" --output_dir ~/tesstutorial/e13beval \ >>>> --training_text ../langdata/eng/eng.training_e13b_text >>>> >>>> Training from scratch: >>>> mkdir -p ~/tesstutorial/e13boutput >>>> src/training/lstmtraining --debug_interval 100 \ >>>> --traineddata ~/tesstutorial/e13beval/eng/eng.traineddata \ >>>> --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 >>>> O1c111]' \ >>>> --model_output ~/tesstutorial/e13boutput/base --learning_rate 20e-4 \ >>>> --train_listfile ~/tesstutorial/e13beval/eng.training_files.txt \ >>>> --eval_listfile ~/tesstutorial/e13beval/eng.training_files.txt \ >>>> --max_iterations 5000 &>~/tesstutorial/e13boutput/basetrain.log >>>> >>>> Test with base_checkpoint: >>>> src/training/lstmeval --model ~/tesstutorial/e13boutput/base_checkpoint >>>> \ >>>> --traineddata ~/tesstutorial/e13beval/eng/eng.traineddata \ >>>> --eval_listfile ~/tesstutorial/e13beval/eng.training_files.txt >>>> >>>> Combining output files: >>>> src/training/lstmtraining --stop_training \ >>>> --continue_from ~/tesstutorial/e13boutput/base_checkpoint \ >>>> --traineddata ~/tesstutorial/e13beval/eng/eng.traineddata \ >>>> --model_output ~/tesstutorial/e13boutput/eng.traineddata >>>> >>>> Test with eng.traineddata: >>>> tesseract e13b.png out --tessdata-dir >>>> /home/koichi/tesstutorial/e13boutput >>>> >>>> >>>> The training from scratch ended as: >>>> >>>> At iteration 561/2500/2500, Mean rms=0.159%, delta=0%, char train=0%, >>>> word train=0%, skip ratio=0%, New best char error = 0 wrote best >>>> model:/home/koichi/tesstutorial/e13boutput/base0_561.checkpoint wrote >>>> checkpoint. >>>> >>>> >>>> The test with base_checkpoint returns nothing as: >>>> >>>> At iteration 0, stage 0, Eval Char error rate=0, Word error rate=0 >>>> >>>> >>>> The test with eng.traineddata and e13b.png returns out.txt. Both files >>>> are attached. >>>> >>>> Training seems to have worked fine. I don't know how to translate the >>>> test result from base_checkpoint. The generated eng.traineddata obviously >>>> doesn't work well. I suspect the choice of --traineddata in combining >>>> output files is bad but I have no clue. >>>> >>>> Regards, >>>> ElMagoElGato >>>> >>>> BTW, I referred to your tess4training in the process. It helped a lot. >>>> >>>> 2019年5月29日水曜日 19時14分08秒 UTC+9 shree: >>>>> >>>>> see >>>>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#combining-the-output-files >>>>> >>>>> On Wed, May 29, 2019 at 3:18 PM ElGato ElMago <elmago...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I wish to make a trained data for E13B font. >>>>>> >>>>>> I read the training tutorial and made a base_checkpoint file >>>>>> according to the method in Training From Scratch. Now, how can I make a >>>>>> trained data from the base_checkpoint file? >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "tesseract-ocr" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to tesser...@googlegroups.com. >>>>>> To post to this group, send email to tesser...@googlegroups.com. >>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/tesseract-ocr/4848cfa5-ae2b-4be3-a771-686aa0fec702%40googlegroups.com >>>>>> >>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/4848cfa5-ae2b-4be3-a771-686aa0fec702%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> ____________________________________________________________ >>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to tesser...@googlegroups.com. >>>> To post to this group, send email to tesser...@googlegroups.com. >>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/7f29f47e-c6f5-4743-832d-94e7d28ab4e8%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/7f29f47e-c6f5-4743-832d-94e7d28ab4e8%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesser...@googlegroups.com <javascript:>. >> To post to this group, send email to tesser...@googlegroups.com >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/2c6fe865-911d-41f3-9926-cbfb56db794f%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/2c6fe865-911d-41f3-9926-cbfb56db794f%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > > -- > > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/1f070094-8982-46ce-837b-0ef03c39e14a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.