Is anybody here,can some one help me,thanks a lot. <yixinlucky...@gmail.com> 于2019年4月18日周四 下午5:19写道:
> Hello,everyone: > I have used tesseract 4.0 to train a chi_sim model,but the result is > not so good as I expected,So I think out one way for fine tuning. > > 1.src/training/tesstrain.sh --fonts_dir /usr/share/fonts --training_text > ../training_data/chi_sim_layer_training_text \ > --langdata_dir ../langdata_lstm --tessdata_dir ./tessdata --lang chi_sim > --linedata_only --noextract_font_properties --exposures "0" \ > --maxpages 0 \ > --workspace_dir ~/share/workspace/tmp \ > --save_box_tiff \ > --fontlist "NSimSun" \ > "Times New Roman" \ > "Arial Unicode MS" \ > "SimSun" \ > "Noto Sans CJK SC" \ > "Noto Sans Mono CJK SC" \ > --output_dir ~/tesstutorial/chi_sim_train \ > --overwrite > > 2. mkdir -p ~/tesstutorial/chi_sim_layer_from_chi_sim > > 3.combine_tessdata -e ../tessdata_best/chi_sim.traineddata > ~/tesstutorial/chi_sim_layer_from_chi_sim/chi_sim.lstm > > 4.lstmtraining --model_output > ~/tesstutorial/chi_sim_layer_from_chi_sim/chi_sim_layer \ > --continue_from ~/tesstutorial/chi_sim_layer_from_chi_sim/chi_sim.lstm \ > --traineddata ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata \ > --old_traineddata ../tessdata_best/chi_sim.traineddata \ > --append_index 5 --net_spec '[Lfx192 O1c1]' \ > --train_listfile ~/tesstutorial/chi_sim_train/chi_sim.training_files.txt \ > --max_iterations 40000 > > 5.lstmtraining --stop_training --continue_from > ~/tesstutorial/chi_sim_layer_from_chi_sim/chi_sim_layer_checkpoint \ > --traineddata > ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata --model_output > ~/tesstutorial/chi_sim_layer_from_chi_sim/chi_sim_layer.traineddata > > The steps above is the normal way, then I continue fine tuning based on > the chi_sim_layer.traineddata which is obtain before. > > Then use the OCR-D https://github.com/OCR-D/ocrd-train for fine tuning. > > 6. Prepare ground-truth files(include tif and txt file). > 7. Modify the Makefile in the OCR-D to satisfy my need. > 8. make training > > Can I use this way,Please check whether it is feasible ? > > Thank you in advance.Sorry for my poor English. > > > > > > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/3a2c1647-4fd3-4766-88e3-379ccf4221dd%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/3a2c1647-4fd3-4766-88e3-379ccf4221dd%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAPiKE20K0HpNN%2Brz%2BJtyB8VuqaDdNXwTZGVTCquDMAOYe7BW1w%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.