On Mon, 26 Jul 2021 at 06:59, Inductiveload <inductivel...@gmail.com> wrote: > What is the correct way to generate the required data for running lstmeval > manually in this case?
I did actually figure this out in the end, so in case anyone else in future is as dumb as me and to avoid anyone trying to answer a solved problem here's my solution (x-posted at stackoverflow[2]) You can generate the .ltsmf files needed for the evaluation like this, assuming the evaluation ground-truth is in tesstrain/data/eval-ground-truth: cd tesstrain make lists MODEL_NAME=eval This will generate a file data/eval/all-lstmf, which contains a list of all the .lstmf files generated. The list.eval contains only a subset, as the ground truth corpus is partitioned into evaluation and training sets (according to RATIO_TRAIN). You can then run lstmeval: lstmeval \ --model data/your_model.traineddata \ --eval_listfile data/eval/all-lstmf Producing something like this (the mistake below was added to the ground truth of one .gt.txt file to provoke an error for example purposes): Warning: LSTMTrainer deserialized an LSTMRecognizer! Truth:TThoſe hypocrites that live amongſt us, OCR :Those hypocrites that live amongst us, At iteration 0, stage 0, Eval Char error rate=1.282051, Word error rate=8.333333 If there are no errors (as it was in this case), it looks like: Warning: LSTMTrainer deserialized an LSTMRecognizer! At iteration 0, stage 0, Eval Char error rate=0.000000, Word error rate=0.000000 Cheers! [2] https://stackoverflow.com/questions/68523440/evaluation-of-a-trained-on-generated-images-tesseract-4-lstm-model-against-real -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CACwSVJ1%3DwvhK09LSNqWP2_oC_5wtgapFJFJMKpoPT578D1XcLg%40mail.gmail.com.