Thank you very much. >>"Train for a few epochs (100 or 1000 depending on how much data you have), stop it and check with lstmeval if the *eval score* is improving. Restart the training adding 100/1000 to the max_iterations and continue from the previous model and repeat until the eval score stops to improve, or gets worse, for a few iterations."
The eval step is manual. The user should stop training and then check the eval data, then go on training ...... Is there any method can do the eval automatically. I mean each epochs we can see the training error and eval error. Thanks. Shree Devi Kumar <shreesh...@gmail.com> 于2019年4月18日周四 上午1:16写道: > >BTW, for anybody: is there a way to query a model or a checkpoint for > the net_specs? > > There is no existing utility to do that. However, Ray had dumped the info > for tessdata_fast (and partly for tessdata_best) which has been posted in > the wiki at > https://github.com/tesseract-ocr/tesseract/wiki/Data-Files-in-tessdata_fast > > > On Wed, Apr 17, 2019 at 1:40 PM Lorenzo Bolzani <l.bolz...@gmail.com> > wrote: > >> >> Split the data set in two parts (80/20 for example), use the large one >> for training and the other for evaluation. >> >> Train for a few epochs (100 or 1000 depending on how much data you have), >> stop it and check with lstmeval if the *eval score* is improving. >> Restart the training adding 100/1000 to the max_iterations and continue >> from the previous model and repeat until the eval score stops to improve, >> or gets worse, for a few iterations. >> >> You can use something like this for the split: >> >> cd train_folder/ >> ls | shuf | head -NNN | parallel mv {} eval_folder/ >> >> >> You can have a look here for a similar setup: >> https://github.com/OCR-D/ocrd-train >> >> >> Also you do not strictly need to use append_index for simple fine tuning, >> have a look at ocrd-train. If you are training for weird stuff it could >> help. >> >> I think >> <https://github.com/tesseract-ocr/tesseract/wiki/Data-Files-in-tessdata_fast#version-string--40000alpha--network-specification> >> (also <https://github.com/tesseract-ocr/tesseract/issues/1404>) that >> fast model uses 192 for the final lstm layer, 384 for default, 512 for best >> model. >> >> >> >> BTW, for anybody: is there a way to query a model or a checkpoint for the >> net_specs? >> >> >> Lorenzo >> >> >> >> >> Il giorno mer 17 apr 2019 alle ore 05:35 <yixinlucky...@gmail.com> ha >> scritto: >> >>> Hello,everyone: >>> Now I am training use LSTM 4.0,here is my command: >>> >>> rm ~/tesstutorial/chi_sim_train -rf >>> >>> src/training/tesstrain.sh --fonts_dir /usr/share/fonts --training_text >>> ../training_data/chi_sim_layer_training_text \ >>> --langdata_dir ../langdata_lstm --tessdata_dir ./tessdata --lang chi_sim >>> --linedata_only --noextract_font_properties --exposures "0" \ >>> --maxpages 0 \ >>> --workspace_dir ~/share/workspace/tmp \ >>> --save_box_tiff \ >>> --fontlist "NSimSun" \ >>> "Times New Roman" \ >>> "Arial Unicode MS" \ >>> "SimSun" \ >>> "Noto Sans CJK SC" \ >>> "Noto Sans Mono CJK SC" \ >>> --output_dir ~/tesstutorial/chi_sim_train \ >>> --overwrite >>> >>> rm ~/tesstutorial/chi_sim_layer_from_chi_sim -rf >>> >>> mkdir -p ~/tesstutorial/chi_sim_layer_from_chi_sim >>> >>> combine_tessdata -e ../tessdata_best/chi_sim.traineddata >>> ~/tesstutorial/chi_sim_layer_from_chi_sim/chi_sim.lstm >>> >>> lstmtraining --model_output >>> ~/tesstutorial/chi_sim_layer_from_chi_sim/chi_sim_layer \ >>> --continue_from ~/tesstutorial/chi_sim_layer_from_chi_sim/chi_sim.lstm \ >>> --traineddata ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata \ >>> --append_index 5 --net_spec '[Lfx128 O1c1]' \ >>> --train_listfile ~/tesstutorial/chi_sim_train/chi_sim.training_files.txt >>> \ >>> *--max_iterations 30000* >>> >>> lstmtraining --stop_training --continue_from >>> ~/tesstutorial/chi_sim_layer_from_chi_sim/chi_sim_layer_checkpoint \ >>> --traineddata >>> ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata --model_output >>> ~/tesstutorial/chi_sim_layer_from_chi_sim/chi_sim_layer.traineddata >>> >>> >>> >>> My question is how to decide the stop condition,I tried many >>> max_iterations values,but the results are not so good. >>> >>> Thank you in advance. >>> >>> Sorry for my poor English. >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-ocr+unsubscr...@googlegroups.com. >>> To post to this group, send email to tesseract-ocr@googlegroups.com. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/92c126cb-525e-4c2f-a1c8-bbd36db09e51%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/92c126cb-525e-4c2f-a1c8-bbd36db09e51%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-ocr+unsubscr...@googlegroups.com. >> To post to this group, send email to tesseract-ocr@googlegroups.com. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLyBMnuviZU2m19Y3r492N_D36MOjp4S57bEvpaqnPyJAQ%40mail.gmail.com >> <https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLyBMnuviZU2m19Y3r492N_D36MOjp4S57bEvpaqnPyJAQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > > -- > > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVtnPq3RwE6ZBuOgPPXS2fgMhSc7j%3DZwtYergWQEuS4Ag%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVtnPq3RwE6ZBuOgPPXS2fgMhSc7j%3DZwtYergWQEuS4Ag%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAPiKE23TJxtR6f8WF0_8e8PMCvRHjuB0WLH93P005iVFLN%2B2Og%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.