Your best source for documentation is the source code. See https://github.com/tesseract-ocr/tesseract/blob/f522b039a52ae0094fb928ac60a66c4ae0f6c5b9/src/training/lstmtrainer.cpp#L371
https://github.com/tesseract-ocr/tesseract/blob/f522b039a52ae0094fb928ac60a66c4ae0f6c5b9/src/training/lstmtrainer.cpp#L382 On Fri, Jun 28, 2019 at 8:47 PM Arno Loo <arno.laf...@gmail.com> wrote: > I continue to make experiments and trying to understand what seems > important and I have a few questions after a research in Tesseract's wiki > > During the training we can see this kind of information : > At iteration 100/100/100, Mean rms=4.514%, delta=19.089%, char train= > 96.314%, word train=100%, skip ratio=0%, New best char error = 96.314 > wrote checkpoint. > > - *100/100/100 :* What do this 3 numbers at the begining mean when they > are different ? Which they are often, unlike in my example. > - *Mean rms* I know well, it's the Root Mean Square error. But what error > metric is used ? Usually it is some kind of distance, the Levenshtein > distance is often appropriate for OCR tasks but the "%" wouldn't be there > if it was. > - *delta* I don't know > - *char train *must be the percentage of wrong character predictions > during the *training* > - *word train *must be the percentage of wrong word predictions during > the *training* > - * skip ratio *is I think the percentage of samples skip for any reason > (invalid data or something) > > Does anyone can help me understand them please ? > > Also, I do not see any error on evaluation during the training. Which > would be really helpful to avoid overfitting. The only way I would know how > to follow the *evaluation* error during the training would be to try a > lstmeval on each checkpoint, but I think there must be a better way ? > Otherwise the *--eval_listfile *argument would be useless in > lstmtraining, but I can't find out how it is used. > > Thank you :) > > Le jeudi 27 juin 2019 19:17:46 UTC+2, shree a écrit : >> >> See >> https://github.com/tesseract-ocr/tesseract/blob/master/doc/lstmeval.1.asc >> >> When using checkpoint you need to also use the starter traineddata file >> used for training. >> >> Or give final traineddata file as model. >> >> So, if after training u have converted the checkpoint to a traineddata, >> you can use that as model. Similarly for the original traineddata. >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/19f392d5-6d77-4830-93ff-c446d06df6fa%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/19f392d5-6d77-4830-93ff-c446d06df6fa%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXO1-hTEL0PgRRLxscbkO%2BcyCJiw53f9qReGZxFEuz_cA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.