>Why not just use ocrd for fine tune training? Just set up your START_MODEL
as chi_sim.
 Because I have trained a chi_sim model from Tesseract-OCR, and I don't
have too many sample images.


Shanshan Wang <cool...@gmail.com> 于2019年4月22日周一 下午8:34写道:

> Why not just use ocrd for fine tune training? Just set up your START_MODEL
> as chi_sim.
>
> On Sun, Apr 21, 2019 at 19:34 易鑫 <yixinlucky...@gmail.com> wrote:
>
>> No,I want to fine tuning using actual images.
>>
>> suraa syss <suras...@gmail.com> 于2019年4月19日周五 下午7:07写道:
>>
>>> you want to prepare unicharset before lstm training
>>>
>>> On Thursday, 18 April 2019 14:49:20 UTC+5:30, yixinl...@gmail.com wrote:
>>>>
>>>> Hello,everyone:
>>>>      I have used tesseract 4.0 to train a chi_sim model,but the result
>>>> is not so good as I expected,So I think out one way for fine tuning.
>>>>
>>>> 1.src/training/tesstrain.sh --fonts_dir /usr/share/fonts
>>>> --training_text ../training_data/chi_sim_layer_training_text  \
>>>> --langdata_dir ../langdata_lstm --tessdata_dir ./tessdata --lang
>>>> chi_sim --linedata_only --noextract_font_properties  --exposures "0" \
>>>> --maxpages 0 \
>>>> --workspace_dir ~/share/workspace/tmp \
>>>> --save_box_tiff \
>>>>  --fontlist  "NSimSun" \
>>>>         "Times New Roman" \
>>>>        "Arial Unicode MS" \
>>>>        "SimSun" \
>>>>        "Noto Sans CJK SC" \
>>>> "Noto Sans Mono CJK SC" \
>>>> --output_dir ~/tesstutorial/chi_sim_train \
>>>> --overwrite
>>>>
>>>> 2. mkdir -p ~/tesstutorial/chi_sim_layer_from_chi_sim
>>>>
>>>> 3.combine_tessdata -e ../tessdata_best/chi_sim.traineddata
>>>> ~/tesstutorial/chi_sim_layer_from_chi_sim/chi_sim.lstm
>>>>
>>>> 4.lstmtraining --model_output
>>>> ~/tesstutorial/chi_sim_layer_from_chi_sim/chi_sim_layer  \
>>>> --continue_from ~/tesstutorial/chi_sim_layer_from_chi_sim/chi_sim.lstm \
>>>> --traineddata ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata \
>>>> --old_traineddata ../tessdata_best/chi_sim.traineddata \
>>>> --append_index 5 --net_spec '[Lfx192 O1c1]' \
>>>> --train_listfile
>>>> ~/tesstutorial/chi_sim_train/chi_sim.training_files.txt \
>>>> --max_iterations 40000
>>>>
>>>> 5.lstmtraining --stop_training --continue_from
>>>> ~/tesstutorial/chi_sim_layer_from_chi_sim/chi_sim_layer_checkpoint  \
>>>>            --traineddata
>>>> ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata --model_output
>>>> ~/tesstutorial/chi_sim_layer_from_chi_sim/chi_sim_layer.traineddata
>>>>
>>>> The steps above is the normal way, then I continue fine tuning based
>>>> on  the chi_sim_layer.traineddata which is obtain before.
>>>>
>>>> Then use the  OCR-D https://github.com/OCR-D/ocrd-train for fine
>>>> tuning.
>>>>
>>>> 6. Prepare ground-truth files(include tif and txt file).
>>>> 7. Modify the Makefile in the OCR-D to satisfy my need.
>>>> 8. make training
>>>>
>>>> Can I use this way,Please check whether it is feasible ?
>>>>
>>>> Thank you in advance.Sorry for my poor English.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-ocr+unsubscr...@googlegroups.com.
>>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/c09249de-79ff-4499-bd92-d459f99321e9%40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/c09249de-79ff-4499-bd92-d459f99321e9%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/CAPiKE20QBtNLvg588gOcieBvAZsdEBkZfyDwfULpOJPgHnT%3Dtw%40mail.gmail.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/CAPiKE20QBtNLvg588gOcieBvAZsdEBkZfyDwfULpOJPgHnT%3Dtw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAFg-N5tDhr8xuE5-DtOo%3DNCkSV3ZL_-1dpkmX6zb55srtSueiQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CAFg-N5tDhr8xuE5-DtOo%3DNCkSV3ZL_-1dpkmX6zb55srtSueiQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAPiKE23ixshpV%2BBOVtFftgswiegT5DCHK1OXpp-NM4b%3DwDEX%3Dg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to