[tesseract-ocr] not training on image after loading data

Kumar Rajwani Fri, 05 Feb 2021 02:35:26 -0800

HI,
i am trying to finetune eng.traindata as per my images i have tried to 
train but all time i am stuck somewhere can you tell me how can i procced 
further.
current steps
step 1 make box files


%%bash
for file in *.jpg; do
  echo $file
  base=`basename $file .jpg`
  tesseract $file $base lstmbox
done


step 2 make lstmf file
%%bash
for file in *.jpg; do
  echo $file
  base=`basename $file .jpg`
  tesseract $file $base lstm.train
done

step 3 create unichar set
%%bash
function wrap {
    for i in `seq 0 $1`; do
        echo "$2$i$3"
    done
}
N=0
unicharset_extractor `wrap $N "eng.arial.exp" ".box"`

step 4 start training

!lstmtraining \
 --model_output output/ \
 --continue_from lstm_model/eng.lstm \
 --traineddata /usr/share/tesseract-ocr/5/tessdata/eng.traineddata \
 --train_listfile list.train \
 --eval_listfile list.eval \
 --max_iterations 400 \


in step 4 it will give following output
Loaded file lstm_model/eng.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Continuing from lstm_model/eng.lstm
Loaded 128/128 lines (1-128) of document eng.arial.exp0.lstmf
Loaded 131/131 lines (1-131) of document eng.arial.exp9.lstmf
Loaded 135/135 lines (1-135) of document eng.arial.exp7.lstmf
Loaded 114/114 lines (1-114) of document eng.arial.exp2.lstmf
Loaded 93/93 lines (1-93) of document eng.arial.exp6.lstmf
Loaded 104/104 lines (1-104) of document eng.arial.exp4.lstmf
Loaded 88/88 lines (1-88) of document eng.arial.exp5.lstmf
Loaded 131/131 lines (1-131) of document eng.arial.exp3.lstmf

This is not training after this.
so can you tell me what changes i can do to successfull training.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/977d82fc-c2a6-4c3d-8db5-c6c917e9c8c0n%40googlegroups.com.

[tesseract-ocr] not training on image after loading data

Reply via email to