>Why not just use ocrd for fine tune training? Just set up your START_MODEL as chi_sim. Because I have trained a chi_sim model from Tesseract-OCR, and I don't have too many sample images.
Shanshan Wang <cool...@gmail.com> 于2019年4月22日周一 下午8:34写道: > Why not just use ocrd for fine tune training? Just set up your START_MODEL > as chi_sim. > > On Sun, Apr 21, 2019 at 19:34 易鑫 <yixinlucky...@gmail.com> wrote: > >> No,I want to fine tuning using actual images. >> >> suraa syss <suras...@gmail.com> 于2019年4月19日周五 下午7:07写道: >> >>> you want to prepare unicharset before lstm training >>> >>> On Thursday, 18 April 2019 14:49:20 UTC+5:30, yixinl...@gmail.com wrote: >>>> >>>> Hello,everyone: >>>> I have used tesseract 4.0 to train a chi_sim model,but the result >>>> is not so good as I expected,So I think out one way for fine tuning. >>>> >>>> 1.src/training/tesstrain.sh --fonts_dir /usr/share/fonts >>>> --training_text ../training_data/chi_sim_layer_training_text \ >>>> --langdata_dir ../langdata_lstm --tessdata_dir ./tessdata --lang >>>> chi_sim --linedata_only --noextract_font_properties --exposures "0" \ >>>> --maxpages 0 \ >>>> --workspace_dir ~/share/workspace/tmp \ >>>> --save_box_tiff \ >>>> --fontlist "NSimSun" \ >>>> "Times New Roman" \ >>>> "Arial Unicode MS" \ >>>> "SimSun" \ >>>> "Noto Sans CJK SC" \ >>>> "Noto Sans Mono CJK SC" \ >>>> --output_dir ~/tesstutorial/chi_sim_train \ >>>> --overwrite >>>> >>>> 2. mkdir -p ~/tesstutorial/chi_sim_layer_from_chi_sim >>>> >>>> 3.combine_tessdata -e ../tessdata_best/chi_sim.traineddata >>>> ~/tesstutorial/chi_sim_layer_from_chi_sim/chi_sim.lstm >>>> >>>> 4.lstmtraining --model_output >>>> ~/tesstutorial/chi_sim_layer_from_chi_sim/chi_sim_layer \ >>>> --continue_from ~/tesstutorial/chi_sim_layer_from_chi_sim/chi_sim.lstm \ >>>> --traineddata ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata \ >>>> --old_traineddata ../tessdata_best/chi_sim.traineddata \ >>>> --append_index 5 --net_spec '[Lfx192 O1c1]' \ >>>> --train_listfile >>>> ~/tesstutorial/chi_sim_train/chi_sim.training_files.txt \ >>>> --max_iterations 40000 >>>> >>>> 5.lstmtraining --stop_training --continue_from >>>> ~/tesstutorial/chi_sim_layer_from_chi_sim/chi_sim_layer_checkpoint \ >>>> --traineddata >>>> ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata --model_output >>>> ~/tesstutorial/chi_sim_layer_from_chi_sim/chi_sim_layer.traineddata >>>> >>>> The steps above is the normal way, then I continue fine tuning based >>>> on the chi_sim_layer.traineddata which is obtain before. >>>> >>>> Then use the OCR-D https://github.com/OCR-D/ocrd-train for fine >>>> tuning. >>>> >>>> 6. Prepare ground-truth files(include tif and txt file). >>>> 7. Modify the Makefile in the OCR-D to satisfy my need. >>>> 8. make training >>>> >>>> Can I use this way,Please check whether it is feasible ? >>>> >>>> Thank you in advance.Sorry for my poor English. >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-ocr+unsubscr...@googlegroups.com. >>> To post to this group, send email to tesseract-ocr@googlegroups.com. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/c09249de-79ff-4499-bd92-d459f99321e9%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/c09249de-79ff-4499-bd92-d459f99321e9%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-ocr+unsubscr...@googlegroups.com. >> To post to this group, send email to tesseract-ocr@googlegroups.com. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/CAPiKE20QBtNLvg588gOcieBvAZsdEBkZfyDwfULpOJPgHnT%3Dtw%40mail.gmail.com >> <https://groups.google.com/d/msgid/tesseract-ocr/CAPiKE20QBtNLvg588gOcieBvAZsdEBkZfyDwfULpOJPgHnT%3Dtw%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/CAFg-N5tDhr8xuE5-DtOo%3DNCkSV3ZL_-1dpkmX6zb55srtSueiQ%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAFg-N5tDhr8xuE5-DtOo%3DNCkSV3ZL_-1dpkmX6zb55srtSueiQ%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAPiKE23ixshpV%2BBOVtFftgswiegT5DCHK1OXpp-NM4b%3DwDEX%3Dg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.