I had this error when I was mixing best models with non best models. I would try to run again
combine_tessdata -e base_model/eng.traineddata base_model/eng.lstm to generate the eng.lstm from the "_best" model (the ones from /usr/share/tessdata are not the "_best" models). Then if the error is still there, just to be sure I do not really know if it matters, I would also recreate the lstmf files. Lorenzo 2018-07-23 22:56 GMT+02:00 Emiliano Isaza Villamizar <eis...@gmail.com>: > Hello everyone, > > > 'm trying to train tesseract to improve the detection of some prices such > as: CN¥2,400.48. I got got to a point that I keep getting this error: > > *total=`cat data/all-lstmf | wc -l` \* > * no=`echo "$total * 0.90 / 1" | bc`; \* > * head -n "$no" data/all-lstmf > "data/list.train"* > *total=`cat data/all-lstmf | wc -l` \* > * no=`echo "($total - $total * 0.90) / 1" | bc`; \* > * tail -n "+$no" data/all-lstmf > "data/list.eval"* > *combine_lang_model \* > * --input_unicharset data/unicharset \* > * --script_dir > /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master > \* > * --words > /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.wordlist > \* > * --numbers > /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.numbers > \* > * --puncs > /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.punc > \* > * --output_dir data/ \* > * --lang eng* > *Loaded unicharset of size 113 from file data/unicharset* > *Setting unichar properties* > *Other case É of é is not in unicharset* > *Setting script properties* > *Config file is optional, continuing...* > *Failed to read data from: > /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.config* > *Null char=2* > *Reducing Trie to SquishedDawg* > *Reducing Trie to SquishedDawg* > *Reducing Trie to SquishedDawg* > *mkdir -p data/checkpoints* > *lstmtraining \* > * --continue_from > > /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.lstm > \* > * --old_traineddata > /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.traineddata > \* > * --traineddata data/eng/eng.traineddata \* > * --model_output data/checkpoints/eng \* > * --debug_interval -1 \* > * --train_listfile data/list.train \* > * --eval_listfile data/list.eval \* > * --sequential_training \* > * --max_iterations 3000* > *Loaded file > /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.lstm, > unpacking...* > *Warning: LSTMTrainer deserialized an LSTMRecognizer!* > *Code range changed from 111 to 112!* > *Num (Extended) outputs,weights in Series:* > * 1,36,0,1:1, 0* > *Num (Extended) outputs,weights in Series:* > * C3,3:9, 0* > * Ft16:16, 160* > *Total weights = 160* > * [C3,3Ft16]:16, 160* > * Mp3,3:16, 0* > * Lfys64:64, 20736* > * Lfx96:96, 61824* > * Lrx96:96, 74112* > * Lfx512:512, 1247232* > * Fc112:112, 0* > *Total weights = 1404064* > *Previous null char=110 mapped to 111* > *Continuing from > /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.lstm* > *Loaded 1/1 pages (1-1) of document > /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/72b.lstmf* > *Loaded 1/1 pages (1-1) of document > /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/67e.lstmf* > *Loaded 1/1 pages (1-1) of document > /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/75c.lstmf* > *Loaded 1/1 pages (1-1) of document > /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/48b.lstmf* > *Iteration 0: ALIGNED TRUTH : CN¥2,400.48* > *Iteration 0: BEST OCR TEXT : ₩₩₩N₩₩4₩0₩0₩4₩8* > *File > /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/72b.lstmf > page 0 :* > *!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244* > *!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244* > *!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244* > *Makefile:111: recipe for target 'data/checkpoints/eng_checkpoint' failed* > *make: *** [data/checkpoints/eng_checkpoint] Segmentation fault (core > dumped)* > > I already tried to download the best/tessdata eng.traineddata and > replacing it in the continue_from but I haven't been able to pass this > mistake. Any thoughts? > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/6152d324-0713-4de6-b646-162923273b63% > 40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/6152d324-0713-4de6-b646-162923273b63%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLyb29sF8L25Q5fh%2BoURtOr-L2zEAeQkghh1CZ8TDpXHJQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.