[tesseract-ocr] Re: Can I use this way for fine tuning?

suraa syss Fri, 19 Apr 2019 04:07:47 -0700

you want to prepare unicharset before lstm training

On Thursday, 18 April 2019 14:49:20 UTC+5:30, yixinl...@gmail.com wrote:
>
> Hello,everyone:
>      I have used tesseract 4.0 to train a chi_sim model,but the result is 
> not so good as I expected,So I think out one way for fine tuning. 
>
> 1.src/training/tesstrain.sh --fonts_dir /usr/share/fonts --training_text 
> ../training_data/chi_sim_layer_training_text  \
> --langdata_dir ../langdata_lstm --tessdata_dir ./tessdata --lang chi_sim 
> --linedata_only --noextract_font_properties  --exposures "0" \
> --maxpages 0 \
> --workspace_dir ~/share/workspace/tmp \
> --save_box_tiff \
>  --fontlist  "NSimSun" \
>         "Times New Roman" \
>        "Arial Unicode MS" \
>        "SimSun" \
>        "Noto Sans CJK SC" \
> "Noto Sans Mono CJK SC" \
> --output_dir ~/tesstutorial/chi_sim_train \
> --overwrite
>
> 2. mkdir -p ~/tesstutorial/chi_sim_layer_from_chi_sim
>
> 3.combine_tessdata -e ../tessdata_best/chi_sim.traineddata 
> ~/tesstutorial/chi_sim_layer_from_chi_sim/chi_sim.lstm
>
> 4.lstmtraining --model_output 
> ~/tesstutorial/chi_sim_layer_from_chi_sim/chi_sim_layer  \
> --continue_from ~/tesstutorial/chi_sim_layer_from_chi_sim/chi_sim.lstm \
> --traineddata ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata \
> --old_traineddata ../tessdata_best/chi_sim.traineddata \
> --append_index 5 --net_spec '[Lfx192 O1c1]' \
> --train_listfile ~/tesstutorial/chi_sim_train/chi_sim.training_files.txt \
> --max_iterations 40000
>
> 5.lstmtraining --stop_training --continue_from 
> ~/tesstutorial/chi_sim_layer_from_chi_sim/chi_sim_layer_checkpoint  \
>            --traineddata 
> ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata --model_output 
> ~/tesstutorial/chi_sim_layer_from_chi_sim/chi_sim_layer.traineddata
>
> The steps above is the normal way, then I continue fine tuning based on  
> the chi_sim_layer.traineddata which is obtain before.
>
> Then use the  OCR-D https://github.com/OCR-D/ocrd-train for fine tuning.
>
> 6. Prepare ground-truth files(include tif and txt file).
> 7. Modify the Makefile in the OCR-D to satisfy my need.
> 8. make training  
>
> Can I use this way,Please check whether it is feasible ? 
>
> Thank you in advance.Sorry for my poor English.
>
>
>
>
>
>
>


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/c09249de-79ff-4499-bd92-d459f99321e9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: Can I use this way for fine tuning?

Reply via email to