I continue to make experiments and trying to understand what seems 
important and I have a few questions after a research in Tesseract's wiki

During the training we can see this kind of information :
At iteration 100/100/100, Mean rms=4.514%, delta=19.089%, char train=96.314
%, word train=100%, skip ratio=0%,  New best char error = 96.314 wrote 
checkpoint.

- *100/100/100 :* What do this 3 numbers at the begining mean when they are 
different ? Which they are often, unlike in my example.
- *Mean rms* I know well, it's the Root Mean Square error. But what error 
metric is used ? Usually it is some kind of distance, the Levenshtein 
distance is often appropriate for OCR tasks but the "%" wouldn't be there 
if it was.
- *delta* I don't know
- *char train *must be the percentage of wrong character predictions during 
the *training*
- *word train *must be the percentage of wrong word predictions during the 
*training*
- * skip ratio *is I think the percentage of samples skip for any reason 
(invalid data or something)

Does anyone can help me understand them please ?

Also, I do not see any error on evaluation during the training. Which would 
be really helpful to avoid overfitting. The only way I would know how to 
follow the *evaluation* error during the training would be to try a 
lstmeval on each checkpoint, but I think there must be a better way ? 
Otherwise the *--eval_listfile *argument would be useless in lstmtraining, 
but I can't find out how it is used.

Thank you :)

Le jeudi 27 juin 2019 19:17:46 UTC+2, shree a écrit :
>
> See 
> https://github.com/tesseract-ocr/tesseract/blob/master/doc/lstmeval.1.asc
>
> When using checkpoint you need to also use the starter traineddata file 
> used for training.
>
> Or give final traineddata file as model.
>
> So, if after training u have converted the checkpoint to a traineddata, 
> you can use that as model. Similarly for the original traineddata.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/19f392d5-6d77-4830-93ff-c446d06df6fa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to