If you train long enough, you will see eval related messages e.g. Line 668: At iteration 9695/15400/15404, Mean rms=0.645%, delta=2.451%, char train=8.273%, word train=18.648%, skip ratio=0%, New worst char error = 8.273At iteration 6102, stage 0, Eval Char error rate=12.403291, Word error rate=24.582963 wrote checkpoint. Line 821: At iteration 12562/22400/22405, Mean rms=0.579%, delta=2.053%, char train=7.1%, word train=16.53%, skip ratio=0%, New worst char error = 7.1At iteration 8689, stage 1, Eval Char error rate=12.493766, Word error rate=23.64956 wrote checkpoint. Line 1009: At iteration 15946/31100/31106, Mean rms=0.525%, delta=1.768%, char train=5.69%, word train=15.172%, skip ratio=0%, New worst char error = 5.69At iteration 11557, stage 1, Eval Char error rate=7.7101831, Word error rate=19.192454 wrote checkpoint. Line 1183: At iteration 18897/39200/39207, Mean rms=0.502%, delta=1.551%, char train=5.08%, word train=14.304%, skip ratio=0.1%, New worst char error = 5.08At iteration 14912, stage 1, Eval Char error rate=6.8221366, Word error rate=18.226883 wrote checkpoint. Line 1413: At iteration 22667/50200/50210, Mean rms=0.433%, delta=1.197%, char train=3.977%, word train=11.758%, skip ratio=0%, New best char error = 3.977At iteration 17869, stage 1, Eval Char error rate=5.7822909, Word error rate=16.036021 wrote best model:/home/ubuntu/tesstutorial/IASTENG_LAYER/IASTENG_LAYER3.977_22667.checkpoint wrote checkpoint. Line 1606: At iteration 25738/59300/59312, Mean rms=0.466%, delta=1.399%, char train=4.48%, word train=12.999%, skip ratio=0.1%, New worst char error = 4.48At iteration 19199, stage 1, Eval Char error rate=5.8820906, Word error rate=16.435243 wrote checkpoint. Line 1791: At iteration 28593/68200/68212, Mean rms=0.412%, delta=1.016%, char train=3.424%, word train=10.999%, skip ratio=0%, New worst char error = 3.424At iteration 24127, stage 1, Eval Char error rate=4.4509122, Word error rate=13.741829 wrote checkpoint. Line 1924: At iteration 30533/74500/74513, Mean rms=0.399%, delta=1.078%, char train=3.749%, word train=10.475%, skip ratio=0%, New worst char error = 3.749At iteration 27583, stage 1, Eval Char error rate=4.3155356, Word error rate=13.993133 wrote checkpoint. Line 2112: At iteration 33286/83400/83416, Mean rms=0.381%, delta=0.947%, char train=3.051%, word train=10.002%, skip ratio=0%, New best char error = 3.051At iteration 29521, stage 1, Eval Char error rate=4.3376752, Word error rate=13.312631 wrote checkpoint. Line 2308: At iteration 36028/92600/92619, Mean rms=0.408%, delta=1.106%, char train=3.788%, word train=11.206%, skip ratio=0%, New worst char error = 3.788At iteration 31215, stage 1, Eval Char error rate=3.9168943, Word error rate=12.539135 wrote checkpoint. Line 2425: At iteration 37731/98200/98220, Mean rms=0.411%, delta=1.101%, char train=3.824%, word train=11.042%, skip ratio=0%, New worst char error = 3.824At iteration 34699, stage 1, Eval Char error rate=3.7448292, Word error rate=12.167938 wrote checkpoint. Line 2555: At iteration 39621/104500/104520, Mean rms=0.368%, delta=0.848%, char train=2.771%, word train=9.92%, skip ratio=0%, New best char error = 2.771At iteration 36028, stage 1, Eval Char error rate=3.8032456, Word error rate=12.157691 wrote best model:/home/ubuntu/tesstutorial/IASTENG_LAYER/IASTENG_LAYER2.771_39621.checkpoint wrote checkpoint. Line 2693: At iteration 41440/110800/110823, Mean rms=0.358%, delta=0.865%, char train=2.847%, word train=9.352%, skip ratio=0.1%, New worst char error = 2.847At iteration 37814, stage 1, Eval Char error rate=3.8059549, Word error rate=12.294499 wrote checkpoint.
On Fri, Jun 28, 2019 at 9:09 PM Shree Devi Kumar <shreesh...@gmail.com> wrote: > Your best source for documentation is the source code. See > > > https://github.com/tesseract-ocr/tesseract/blob/f522b039a52ae0094fb928ac60a66c4ae0f6c5b9/src/training/lstmtrainer.cpp#L371 > > > > https://github.com/tesseract-ocr/tesseract/blob/f522b039a52ae0094fb928ac60a66c4ae0f6c5b9/src/training/lstmtrainer.cpp#L382 > > > On Fri, Jun 28, 2019 at 8:47 PM Arno Loo <arno.laf...@gmail.com> wrote: > >> I continue to make experiments and trying to understand what seems >> important and I have a few questions after a research in Tesseract's wiki >> >> During the training we can see this kind of information : >> At iteration 100/100/100, Mean rms=4.514%, delta=19.089%, char train= >> 96.314%, word train=100%, skip ratio=0%, New best char error = 96.314 >> wrote checkpoint. >> >> - *100/100/100 :* What do this 3 numbers at the begining mean when they >> are different ? Which they are often, unlike in my example. >> - *Mean rms* I know well, it's the Root Mean Square error. But what >> error metric is used ? Usually it is some kind of distance, the Levenshtein >> distance is often appropriate for OCR tasks but the "%" wouldn't be there >> if it was. >> - *delta* I don't know >> - *char train *must be the percentage of wrong character predictions >> during the *training* >> - *word train *must be the percentage of wrong word predictions during >> the *training* >> - * skip ratio *is I think the percentage of samples skip for any reason >> (invalid data or something) >> >> Does anyone can help me understand them please ? >> >> Also, I do not see any error on evaluation during the training. Which >> would be really helpful to avoid overfitting. The only way I would know how >> to follow the *evaluation* error during the training would be to try a >> lstmeval on each checkpoint, but I think there must be a better way ? >> Otherwise the *--eval_listfile *argument would be useless in >> lstmtraining, but I can't find out how it is used. >> >> Thank you :) >> >> Le jeudi 27 juin 2019 19:17:46 UTC+2, shree a écrit : >>> >>> See >>> https://github.com/tesseract-ocr/tesseract/blob/master/doc/lstmeval.1.asc >>> >>> When using checkpoint you need to also use the starter traineddata file >>> used for training. >>> >>> Or give final traineddata file as model. >>> >>> So, if after training u have converted the checkpoint to a traineddata, >>> you can use that as model. Similarly for the original traineddata. >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-ocr+unsubscr...@googlegroups.com. >> To post to this group, send email to tesseract-ocr@googlegroups.com. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/19f392d5-6d77-4830-93ff-c446d06df6fa%40googlegroups.com >> <https://groups.google.com/d/msgid/tesseract-ocr/19f392d5-6d77-4830-93ff-c446d06df6fa%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > > -- > > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > -- ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVhwB8-Q4rMd4UaOvnxKchfeHuxMHC%3DO_tq9W2CRFLXcQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.