Re: [tesseract-ocr] Evaluation of model trained with generated text against real-word data

Inductiveload Tue, 27 Jul 2021 05:46:17 -0700

On Mon, 26 Jul 2021 at 06:59, Inductiveload <inductivel...@gmail.com> wrote:
> What is the correct way to generate the required data for running lstmeval 
> manually in this case?


I did actually figure this out in the end, so in case anyone else in
future is as dumb as me and to avoid anyone trying to answer a solved
problem here's my solution (x-posted at stackoverflow[2])

You can generate the .ltsmf files needed for the evaluation like this,
assuming the evaluation ground-truth is in
tesstrain/data/eval-ground-truth:

cd tesstrain
make lists MODEL_NAME=eval

This will generate a file data/eval/all-lstmf, which contains a list
of all the .lstmf files generated. The list.eval contains only a
subset, as the ground truth corpus is partitioned into evaluation and
training sets (according to RATIO_TRAIN).

You can then run lstmeval:

lstmeval \
    --model data/your_model.traineddata  \
    --eval_listfile data/eval/all-lstmf

Producing something like this (the mistake below was added to the
ground truth of one .gt.txt file to provoke an error for example
purposes):

Warning: LSTMTrainer deserialized an LSTMRecognizer!
Truth:TThoſe hypocrites that live amongſt us,
OCR  :Those hypocrites that live amongst us,
At iteration 0, stage 0, Eval Char error rate=1.282051, Word error rate=8.333333

If there are no errors (as it was in this case), it looks like:

Warning: LSTMTrainer deserialized an LSTMRecognizer!
At iteration 0, stage 0, Eval Char error rate=0.000000, Word error rate=0.000000

Cheers!

[2] 
https://stackoverflow.com/questions/68523440/evaluation-of-a-trained-on-generated-images-tesseract-4-lstm-model-against-real

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CACwSVJ1%3DwvhK09LSNqWP2_oC_5wtgapFJFJMKpoPT578D1XcLg%40mail.gmail.com.

Re: [tesseract-ocr] Evaluation of model trained with generated text against real-word data

Reply via email to