Re: [tesseract-ocr] Changing Parameters when Fine Tuning

Shree Devi Kumar Tue, 21 Aug 2018 03:01:48 -0700

On Tue, Aug 21, 2018 at 1:16 PM <j.biros@churadata.okinawa> wrote:

> Sorry, one more question.  We set up 4 different machines all running the
> command below except for minor differences in the momentum and the learning
> rate.  Changing the momentum and learning rate in this situation, because
> it is fine tuning, shouldn't affect anything right?  In our case though
> each machine produced different results.  Do you have any idea what exactly
> is causing this?  I can provide more information as necessary.  Thanks.
>
> training/lstmtraining --traineddata
> ~/tesstutorial/jpntrain/jpn/jpn.traineddata \
>   --continue_from ~/fine_tuning/models/jpn.lstm \
>   --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' \
>   --model_output
> ~/bouch_fine_tuning/0819_fine20fonts_mom05_lr1e-4_010/base --learning_rate
> 1e-4 \
>   --momentum 0.5 \
>   --train_listfile ~/bouch_train/jpntrain/jpn.training_files.txt \
>   --eval_listfile ~/bouch_train/jpneval/jpn.training_files.txt \
>   --max_iterations 100000
> &>~/bouch_fine_tuning/0819_fine20fonts_mom05_lr1e-4_010/basetrain.log \
>   --old_traineddata /usr/local/share/tessdata/__official_jpn.traineddata \
>   --debug_interval -1
>
> The above is NOT a finetuning command, since you are providing the
complete network spec.

With finetuning for impact (new font), recommended iterations is 400.

With finetuning for plusminus (adding a new character) recommnded
iterations is 3000-3600.

However, these required iterations numbers as well as Ray's tutorial is for
English.

I have found that these do not directly apply to other

languages which require recoding of the unicharset.

You will get quicker results if you replace top layer (compared to your
earlier version which might have started from scratch).

Yo can try the different commands with --debug_interval -1 that will show
you the debug output on console itself, giving you an idea of the training.
eg.

File /tmp/tmp.o98cvEGUNe/akk/akk.CuneiformOB.exp-1.lstmf page 1 (Perfect):
Mean rms=0.167%, delta=0.772%, train=2.703%(4.359%), skip ratio=0.2%
Iteration 600506: ALIGNED TRUTH : ð’€€ð’ˆ¾ ð’‰ºð’‰Œð’…€
Iteration 600506: BEST OCR TEXT : ð’€€ð’ˆ¾ ð’‰ºð’‰Œð’…€

With finetuning, iteration 1 should start with a very low error rate.

For training from scratch it may be even 400% error rate.

For replacing a layer it may start around 150% error rate and come down to
100% after about 600 iterations,

--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXLmhx0X%2B_y%2BeF5ZU9KHvNiReqP8ufyoDvindaqQYNh-w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Changing Parameters when Fine Tuning

Reply via email to