On Tue, Aug 21, 2018 at 1:16 PM <j.biros@churadata.okinawa> wrote: > Sorry, one more question. We set up 4 different machines all running the > command below except for minor differences in the momentum and the learning > rate. Changing the momentum and learning rate in this situation, because > it is fine tuning, shouldn't affect anything right? In our case though > each machine produced different results. Do you have any idea what exactly > is causing this? I can provide more information as necessary. Thanks. > > training/lstmtraining --traineddata > ~/tesstutorial/jpntrain/jpn/jpn.traineddata \ > --continue_from ~/fine_tuning/models/jpn.lstm \ > --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' \ > --model_output > ~/bouch_fine_tuning/0819_fine20fonts_mom05_lr1e-4_010/base --learning_rate > 1e-4 \ > --momentum 0.5 \ > --train_listfile ~/bouch_train/jpntrain/jpn.training_files.txt \ > --eval_listfile ~/bouch_train/jpneval/jpn.training_files.txt \ > --max_iterations 100000 > &>~/bouch_fine_tuning/0819_fine20fonts_mom05_lr1e-4_010/basetrain.log \ > --old_traineddata /usr/local/share/tessdata/__official_jpn.traineddata \ > --debug_interval -1 > > The above is NOT a finetuning command, since you are providing the complete network spec.
With finetuning for impact (new font), recommended iterations is 400. With finetuning for plusminus (adding a new character) recommnded iterations is 3000-3600. However, these required iterations numbers as well as Ray's tutorial is for English. I have found that these do not directly apply to other languages which require recoding of the unicharset. You will get quicker results if you replace top layer (compared to your earlier version which might have started from scratch). Yo can try the different commands with --debug_interval -1 that will show you the debug output on console itself, giving you an idea of the training. eg. File /tmp/tmp.o98cvEGUNe/akk/akk.CuneiformOB.exp-1.lstmf page 1 (Perfect): Mean rms=0.167%, delta=0.772%, train=2.703%(4.359%), skip ratio=0.2% Iteration 600506: ALIGNED TRUTH : 𒀀𒈾 𒉺𒉌𒅀 Iteration 600506: BEST OCR TEXT : 𒀀𒈾 𒉺𒉌𒅀 With finetuning, iteration 1 should start with a very low error rate. For training from scratch it may be even 400% error rate. For replacing a layer it may start around 150% error rate and come down to 100% after about 600 iterations, -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXLmhx0X%2B_y%2BeF5ZU9KHvNiReqP8ufyoDvindaqQYNh-w%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.