Hi,

Could someone help me understand why I am getting the following error when 
using tesstrain with the START_MODEL option? 
      Failed to continue from: data/micr_ref/micr.lstm

>From my local tesstrain repo (cloned 
from https://github.com/tesseract-ocr/tesstrain), I have the following in 
my data directory:

data

├── micr-ground-truth

│   ├── micr-1.gt.txt

│   ├── micr-1.tif

│   ├── micr-2.gt.txt

│   └── micr-2.tif

└── micr_proto-ground-truth

    ├── micr.gt.txt

    └── micr.tif
I am using what is in 'micr_proto-ground-truth' to build my proto model, 
which I then use as a START_MODEL for training the micr model from 
'micr-ground-truth'.

More specifically, I issued the following commands from my tesstrain repo:
      gmake tesseract-langdata
      gmake proto-model MODEL_NAME=micr_proto
      mkdir -p usr/share/tessdata
      cp data/micr_proto/micr_proto.traineddata usr/share/tessdata
      gmake training MODEL_NAME=micr START_MODEL=micr_proto

The final command fails with the following error:
     * Failed to continue from: data/micr_proto/micr.lstm*
*      gmake: *** [Makefile:327: data/micr/checkpoints/micr_checkpoint] 
Error 1*

Can anyone tell me what I am doing wrong?

*Background Info*
My ultimate goal is to train tesseract to OCR the MICR line from the bottom 
of check images with 99+% accuracy.

For my test/training set, I have more than 20K tif check images which I 
have cropped and cleaned using opencv to include only the bottom portion 
which contains the MICR line.  I also have the gt.txt file for each cropped 
image.

I tried the mcr.traineddata 
(from 
https://github.com/BigPino67/Tesseract-MICR-OCR/blob/master/Tessdata/mcr.traineddata)
 
with multiple PSM values, but the accuracy was very low.

I also tried using tesstrain directly as follows with my entire training 
set in the data directory:
    qmake training MODEL_NAME=micr
but the resulting micr.traineddata yielded even worse results.

So now I am trying to build my proto model as described above using a 
single reference image, and then to use that as the START_MODEL for my 
training, but I am hitting the error I mentioned above.

Is my approach incorrect?  If yes, can you please direct me?  I am not 
finding the documentation extremely clear, so I obviously may be doing 
something stupid.

Thanks much for the help,
Keith

BTW, I am attaching the data.zip (contents of my data directory) in case 
someone wants to reproduce this.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/da620ad4-0686-4583-91a4-178bfd81b422n%40googlegroups.com.

<<attachment: data.zip>>

Reply via email to