I had this error when I was mixing best models with non best models.

I would try to run again

combine_tessdata -e base_model/eng.traineddata base_model/eng.lstm

to generate the eng.lstm from the "_best" model (the ones from
/usr/share/tessdata are not the "_best" models).

Then if the error is still there, just to be sure I do not really know if
it matters, I would also recreate the lstmf files.


Lorenzo


2018-07-23 22:56 GMT+02:00 Emiliano Isaza Villamizar <eis...@gmail.com>:

> Hello everyone,
>
>
> 'm trying to train tesseract to improve the detection of some prices such
> as: CN¥2,400.48. I got got to a point that I keep getting this error:
>
> *total=`cat data/all-lstmf | wc -l` \*
> *   no=`echo "$total * 0.90 / 1" | bc`; \*
> *   head -n "$no" data/all-lstmf > "data/list.train"*
> *total=`cat data/all-lstmf | wc -l` \*
> *   no=`echo "($total - $total * 0.90) / 1" | bc`; \*
> *   tail -n "+$no" data/all-lstmf > "data/list.eval"*
> *combine_lang_model \*
> *  --input_unicharset data/unicharset \*
> *  --script_dir
> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master
> \*
> *  --words
> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.wordlist
> \*
> *  --numbers
> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.numbers
> \*
> *  --puncs
> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.punc
> \*
> *  --output_dir data/ \*
> *  --lang eng*
> *Loaded unicharset of size 113 from file data/unicharset*
> *Setting unichar properties*
> *Other case É of é is not in unicharset*
> *Setting script properties*
> *Config file is optional, continuing...*
> *Failed to read data from:
> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.config*
> *Null char=2*
> *Reducing Trie to SquishedDawg*
> *Reducing Trie to SquishedDawg*
> *Reducing Trie to SquishedDawg*
> *mkdir -p data/checkpoints*
> *lstmtraining \*
> *  --continue_from
>  
> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.lstm
> \*
> *  --old_traineddata
> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.traineddata
> \*
> *  --traineddata data/eng/eng.traineddata \*
> *  --model_output data/checkpoints/eng \*
> *  --debug_interval -1 \*
> *  --train_listfile data/list.train \*
> *  --eval_listfile data/list.eval \*
> *  --sequential_training \*
> *  --max_iterations 3000*
> *Loaded file
> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.lstm,
> unpacking...*
> *Warning: LSTMTrainer deserialized an LSTMRecognizer!*
> *Code range changed from 111 to 112!*
> *Num (Extended) outputs,weights in Series:*
> *  1,36,0,1:1, 0*
> *Num (Extended) outputs,weights in Series:*
> *  C3,3:9, 0*
> *  Ft16:16, 160*
> *Total weights = 160*
> *  [C3,3Ft16]:16, 160*
> *  Mp3,3:16, 0*
> *  Lfys64:64, 20736*
> *  Lfx96:96, 61824*
> *  Lrx96:96, 74112*
> *  Lfx512:512, 1247232*
> *  Fc112:112, 0*
> *Total weights = 1404064*
> *Previous null char=110 mapped to 111*
> *Continuing from
> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.lstm*
> *Loaded 1/1 pages (1-1) of document
> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/72b.lstmf*
> *Loaded 1/1 pages (1-1) of document
> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/67e.lstmf*
> *Loaded 1/1 pages (1-1) of document
> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/75c.lstmf*
> *Loaded 1/1 pages (1-1) of document
> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/48b.lstmf*
> *Iteration 0: ALIGNED TRUTH : CN¥2,400.48*
> *Iteration 0: BEST OCR TEXT : ₩₩₩N₩₩4₩0₩0₩4₩8*
> *File
> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/72b.lstmf
> page 0 :*
> *!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244*
> *!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244*
> *!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244*
> *Makefile:111: recipe for target 'data/checkpoints/eng_checkpoint' failed*
> *make: *** [data/checkpoints/eng_checkpoint] Segmentation fault (core
> dumped)*
>
> I already tried to download the best/tessdata eng.traineddata and
> replacing it in the continue_from but I haven't been able to pass this
> mistake. Any thoughts?
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/6152d324-0713-4de6-b646-162923273b63%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/6152d324-0713-4de6-b646-162923273b63%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLyb29sF8L25Q5fh%2BoURtOr-L2zEAeQkghh1CZ8TDpXHJQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to