Re: [tesseract-ocr] understading lstmeval and use it on pretrained models for comparison

Shree Devi Kumar Fri, 28 Jun 2019 08:46:10 -0700

If you train long enough, you will see eval related messages e.g.

Line 668: At iteration 9695/15400/15404, Mean rms=0.645%, delta=2.451%,
char train=8.273%, word train=18.648%, skip ratio=0%,  New worst char error
= 8.273At iteration 6102, stage 0, Eval Char error rate=12.403291, Word
error rate=24.582963 wrote checkpoint.
Line 821: At iteration 12562/22400/22405, Mean rms=0.579%, delta=2.053%,
char train=7.1%, word train=16.53%, skip ratio=0%,  New worst char error =
7.1At iteration 8689, stage 1, Eval Char error rate=12.493766, Word error
rate=23.64956 wrote checkpoint.
Line 1009: At iteration 15946/31100/31106, Mean rms=0.525%, delta=1.768%,
char train=5.69%, word train=15.172%, skip ratio=0%,  New worst char error
= 5.69At iteration 11557, stage 1, Eval Char error rate=7.7101831, Word
error rate=19.192454 wrote checkpoint.
Line 1183: At iteration 18897/39200/39207, Mean rms=0.502%, delta=1.551%,
char train=5.08%, word train=14.304%, skip ratio=0.1%,  New worst char
error = 5.08At iteration 14912, stage 1, Eval Char error rate=6.8221366,
Word error rate=18.226883 wrote checkpoint.
Line 1413: At iteration 22667/50200/50210, Mean rms=0.433%, delta=1.197%,
char train=3.977%, word train=11.758%, skip ratio=0%,  New best char error
= 3.977At iteration 17869, stage 1, Eval Char error rate=5.7822909, Word
error rate=16.036021 wrote best
model:/home/ubuntu/tesstutorial/IASTENG_LAYER/IASTENG_LAYER3.977_22667.checkpoint
wrote checkpoint.
Line 1606: At iteration 25738/59300/59312, Mean rms=0.466%, delta=1.399%,
char train=4.48%, word train=12.999%, skip ratio=0.1%,  New worst char
error = 4.48At iteration 19199, stage 1, Eval Char error rate=5.8820906,
Word error rate=16.435243 wrote checkpoint.
Line 1791: At iteration 28593/68200/68212, Mean rms=0.412%, delta=1.016%,
char train=3.424%, word train=10.999%, skip ratio=0%,  New worst char error
= 3.424At iteration 24127, stage 1, Eval Char error rate=4.4509122, Word
error rate=13.741829 wrote checkpoint.
Line 1924: At iteration 30533/74500/74513, Mean rms=0.399%, delta=1.078%,
char train=3.749%, word train=10.475%, skip ratio=0%,  New worst char error
= 3.749At iteration 27583, stage 1, Eval Char error rate=4.3155356, Word
error rate=13.993133 wrote checkpoint.
Line 2112: At iteration 33286/83400/83416, Mean rms=0.381%, delta=0.947%,
char train=3.051%, word train=10.002%, skip ratio=0%,  New best char error
= 3.051At iteration 29521, stage 1, Eval Char error rate=4.3376752, Word
error rate=13.312631 wrote checkpoint.
Line 2308: At iteration 36028/92600/92619, Mean rms=0.408%, delta=1.106%,
char train=3.788%, word train=11.206%, skip ratio=0%,  New worst char error
= 3.788At iteration 31215, stage 1, Eval Char error rate=3.9168943, Word
error rate=12.539135 wrote checkpoint.
Line 2425: At iteration 37731/98200/98220, Mean rms=0.411%, delta=1.101%,
char train=3.824%, word train=11.042%, skip ratio=0%,  New worst char error
= 3.824At iteration 34699, stage 1, Eval Char error rate=3.7448292, Word
error rate=12.167938 wrote checkpoint.
Line 2555: At iteration 39621/104500/104520, Mean rms=0.368%, delta=0.848%,
char train=2.771%, word train=9.92%, skip ratio=0%,  New best char error =
2.771At iteration 36028, stage 1, Eval Char error rate=3.8032456, Word
error rate=12.157691 wrote best
model:/home/ubuntu/tesstutorial/IASTENG_LAYER/IASTENG_LAYER2.771_39621.checkpoint
wrote checkpoint.
Line 2693: At iteration 41440/110800/110823, Mean rms=0.358%, delta=0.865%,
char train=2.847%, word train=9.352%, skip ratio=0.1%,  New worst char
error = 2.847At iteration 37814, stage 1, Eval Char error rate=3.8059549,
Word error rate=12.294499 wrote checkpoint.



On Fri, Jun 28, 2019 at 9:09 PM Shree Devi Kumar <shreesh...@gmail.com>
wrote:

> Your best source for documentation is the source code. See
>
>
> https://github.com/tesseract-ocr/tesseract/blob/f522b039a52ae0094fb928ac60a66c4ae0f6c5b9/src/training/lstmtrainer.cpp#L371
>
>
>
> https://github.com/tesseract-ocr/tesseract/blob/f522b039a52ae0094fb928ac60a66c4ae0f6c5b9/src/training/lstmtrainer.cpp#L382
>
>
> On Fri, Jun 28, 2019 at 8:47 PM Arno Loo <arno.laf...@gmail.com> wrote:
>
>> I continue to make experiments and trying to understand what seems
>> important and I have a few questions after a research in Tesseract's wiki
>>
>> During the training we can see this kind of information :
>> At iteration 100/100/100, Mean rms=4.514%, delta=19.089%, char train=
>> 96.314%, word train=100%, skip ratio=0%,  New best char error = 96.314
>> wrote checkpoint.
>>
>> - *100/100/100 :* What do this 3 numbers at the begining mean when they
>> are different ? Which they are often, unlike in my example.
>> - *Mean rms* I know well, it's the Root Mean Square error. But what
>> error metric is used ? Usually it is some kind of distance, the Levenshtein
>> distance is often appropriate for OCR tasks but the "%" wouldn't be there
>> if it was.
>> - *delta* I don't know
>> - *char train *must be the percentage of wrong character predictions
>> during the *training*
>> - *word train *must be the percentage of wrong word predictions during
>> the *training*
>> - * skip ratio *is I think the percentage of samples skip for any reason
>> (invalid data or something)
>>
>> Does anyone can help me understand them please ?
>>
>> Also, I do not see any error on evaluation during the training. Which
>> would be really helpful to avoid overfitting. The only way I would know how
>> to follow the *evaluation* error during the training would be to try a
>> lstmeval on each checkpoint, but I think there must be a better way ?
>> Otherwise the *--eval_listfile *argument would be useless in
>> lstmtraining, but I can't find out how it is used.
>>
>> Thank you :)
>>
>> Le jeudi 27 juin 2019 19:17:46 UTC+2, shree a écrit :
>>>
>>> See
>>> https://github.com/tesseract-ocr/tesseract/blob/master/doc/lstmeval.1.asc
>>>
>>> When using checkpoint you need to also use the starter traineddata file
>>> used for training.
>>>
>>> Or give final traineddata file as model.
>>>
>>> So, if after training u have converted the checkpoint to a traineddata,
>>> you can use that as model. Similarly for the original traineddata.
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/19f392d5-6d77-4830-93ff-c446d06df6fa%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/19f392d5-6d77-4830-93ff-c446d06df6fa%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> --
>
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>


-- 

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVhwB8-Q4rMd4UaOvnxKchfeHuxMHC%3DO_tq9W2CRFLXcQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] understading lstmeval and use it on pretrained models for comparison

Reply via email to