Re: [tesseract-ocr] understading lstmeval and use it on pretrained models for comparison

Shree Devi Kumar Fri, 28 Jun 2019 08:40:40 -0700

Your best source for documentation is the source code. See

https://github.com/tesseract-ocr/tesseract/blob/f522b039a52ae0094fb928ac60a66c4ae0f6c5b9/src/training/lstmtrainer.cpp#L371



https://github.com/tesseract-ocr/tesseract/blob/f522b039a52ae0094fb928ac60a66c4ae0f6c5b9/src/training/lstmtrainer.cpp#L382


On Fri, Jun 28, 2019 at 8:47 PM Arno Loo <arno.laf...@gmail.com> wrote:

> I continue to make experiments and trying to understand what seems
> important and I have a few questions after a research in Tesseract's wiki
>
> During the training we can see this kind of information :
> At iteration 100/100/100, Mean rms=4.514%, delta=19.089%, char train=
> 96.314%, word train=100%, skip ratio=0%,  New best char error = 96.314
> wrote checkpoint.
>
> - *100/100/100 :* What do this 3 numbers at the begining mean when they
> are different ? Which they are often, unlike in my example.
> - *Mean rms* I know well, it's the Root Mean Square error. But what error
> metric is used ? Usually it is some kind of distance, the Levenshtein
> distance is often appropriate for OCR tasks but the "%" wouldn't be there
> if it was.
> - *delta* I don't know
> - *char train *must be the percentage of wrong character predictions
> during the *training*
> - *word train *must be the percentage of wrong word predictions during
> the *training*
> - * skip ratio *is I think the percentage of samples skip for any reason
> (invalid data or something)
>
> Does anyone can help me understand them please ?
>
> Also, I do not see any error on evaluation during the training. Which
> would be really helpful to avoid overfitting. The only way I would know how
> to follow the *evaluation* error during the training would be to try a
> lstmeval on each checkpoint, but I think there must be a better way ?
> Otherwise the *--eval_listfile *argument would be useless in
> lstmtraining, but I can't find out how it is used.
>
> Thank you :)
>
> Le jeudi 27 juin 2019 19:17:46 UTC+2, shree a écrit :
>>
>> See
>> https://github.com/tesseract-ocr/tesseract/blob/master/doc/lstmeval.1.asc
>>
>> When using checkpoint you need to also use the starter traineddata file
>> used for training.
>>
>> Or give final traineddata file as model.
>>
>> So, if after training u have converted the checkpoint to a traineddata,
>> you can use that as model. Similarly for the original traineddata.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/19f392d5-6d77-4830-93ff-c446d06df6fa%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/19f392d5-6d77-4830-93ff-c446d06df6fa%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>


-- 

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXO1-hTEL0PgRRLxscbkO%2BcyCJiw53f9qReGZxFEuz_cA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] understading lstmeval and use it on pretrained models for comparison

Reply via email to