[tesseract-ocr] Assert failed:in file weightmatrix.cpp, line 249

Kristóf Horváth Mon, 04 Feb 2019 04:33:20 -0800

Im using Cygwin (64, on win10) to compile tesseract and  I ran the 
following commands and got the following error:
>
> kh@DSAD-6 /usr/share/tessdata
>
> $ tesstrain.sh --fonts_dir /usr/share/fonts --fontlist "Arial" "Impact 
>> Condensed" --lang eng --linedata_only --noextract_font_properties 
>> --langdata_dir ~/langdata/ --tessdata_dir ./ --output_dir 
>> ~/tesstutorial/engtrain
>
>
>> === Starting training for language 'eng'
>
> [Mon, Feb 4, 2019 1:17:48 PM] /usr/bin/text2image 
>> --fonts_dir=/usr/share/fonts --font=Arial 
>> --outputbase=/tmp/font_tmp.bEkR4qa83g/sample_text.txt 
>> --text=/tmp/font_tmp.bEkR4qa83g/sample_text.txt 
>> --fontconfig_tmpdir=/tmp/font_tmp.bEkR4qa83g
>
> Rendered page 0 to file /tmp/font_tmp.bEkR4qa83g/sample_text.txt.tif
>
>
>> === Phase I: Generating training images ===
>
> Rendering using Arial
>
> [Mon, Feb 4, 2019 1:17:51 PM] /usr/bin/text2image 
>> --fontconfig_tmpdir=/tmp/font_tmp.bEkR4qa83g --fonts_dir=/usr/share/fonts 
>> --strip_unrenderable_words --leading=32 --xsize=3600 --char_spacing=0.0 
>> --exposure=0 --outputbase=/tmp/eng-2019-02-04.pCA/eng.Arial.exp0 
>> --max_pages=0 --font=Arial --text=/home/kh/langdata//eng/eng.training_text
>
> Rendering using Impact Condensed
>
> Rendered page 0 to file /tmp/eng-2019-02-04.pCA/eng.Arial.exp0.tif
>
> [Mon, Feb 4, 2019 1:17:52 PM] /usr/bin/text2image 
>> --fontconfig_tmpdir=/tmp/font_tmp.bEkR4qa83g --fonts_dir=/usr/share/fonts 
>> --strip_unrenderable_words --leading=32 --xsize=3600 --char_spacing=0.0 
>> --exposure=0 --outputbase=/tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0 
>> --max_pages=0 --font=Impact Condensed 
>> --text=/home/kh/langdata//eng/eng.training_text
>
> Rendered page 1 to file /tmp/eng-2019-02-04.pCA/eng.Arial.exp0.tif
>
> Rendered page 0 to file 
>> /tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0.tif
>
> Rendered page 1 to file 
>> /tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0.tif
>
>
>> === Phase UP: Generating unicharset and unichar properties files ===
>
> [Mon, Feb 4, 2019 1:17:55 PM] /usr/bin/unicharset_extractor 
>> --output_unicharset /tmp/eng-2019-02-04.pCA/eng.unicharset --norm_mode 1 
>> /tmp/eng-2019-02-04.pCA/eng.Arial.exp0.box 
>> /tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0.box
>
> Extracting unicharset from box file 
>> /tmp/eng-2019-02-04.pCA/eng.Arial.exp0.box
>
> Extracting unicharset from box file 
>> /tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0.box
>
> Other case É of é is not in unicharset
>
> Wrote unicharset file /tmp/eng-2019-02-04.pCA/eng.unicharset
>
> [Mon, Feb 4, 2019 1:17:55 PM] /usr/bin/set_unicharset_properties -U 
>> /tmp/eng-2019-02-04.pCA/eng.unicharset -O 
>> /tmp/eng-2019-02-04.pCA/eng.unicharset -X 
>> /tmp/eng-2019-02-04.pCA/eng.xheights --script_dir=/home/kh/langdata/
>
> Loaded unicharset of size 111 from file 
>> /tmp/eng-2019-02-04.pCA/eng.unicharset
>
> Setting unichar properties
>
> Other case É of é is not in unicharset
>
> Setting script properties
>
> Warning: properties incomplete for index 25 = ~
>
> Writing unicharset to file /tmp/eng-2019-02-04.pCA/eng.unicharset
>
>
>> === Phase E: Generating lstmf files ===
>
> Using TESSDATA_PREFIX=./
>
> [Mon, Feb 4, 2019 1:17:56 PM] /usr/local/bin/tesseract 
>> /tmp/eng-2019-02-04.pCA/eng.Arial.exp0.tif 
>> /tmp/eng-2019-02-04.pCA/eng.Arial.exp0 --psm 6 lstm.train
>
> [Mon, Feb 4, 2019 1:17:56 PM] /usr/local/bin/tesseract 
>> /tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0.tif 
>> /tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0 --psm 6 lstm.train
>
> Tesseract Open Source OCR Engine v4.0.0 with Leptonica
>
> Page 1
>
> Tesseract Open Source OCR Engine v4.0.0 with Leptonica
>
> Page 1
>
> Page 2
>
> Page 2
>
> Loaded 49/49 pages (1-49) of document 
>> /tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0.lstmf
>
> Loaded 52/52 pages (1-52) of document 
>> /tmp/eng-2019-02-04.pCA/eng.Arial.exp0.lstmf
>
>
>> === Constructing LSTM training data ===
>
> [Mon, Feb 4, 2019 1:17:57 PM] /usr/bin/combine_lang_model 
>> --input_unicharset /tmp/eng-2019-02-04.pCA/eng.unicharset --script_dir 
>> /home/kh/langdata/ --words /home/kh/langdata//eng/eng.wordlist --numbers 
>> /home/kh/langdata//eng/eng.numbers --puncs /home/kh/langdata//eng/eng.punc 
>> --output_dir /home/kh/tesstutorial/engtrain --lang eng
>
> Loaded unicharset of size 111 from file 
>> /tmp/eng-2019-02-04.pCA/eng.unicharset
>
> Setting unichar properties
>
> Other case É of é is not in unicharset
>
> Setting script properties
>
> Config file is optional, continuing...
>
> Failed to read data from: /home/kh/langdata//eng/eng.config
>
> Null char=2
>
> Reducing Trie to SquishedDawg
>
> Reducing Trie to SquishedDawg
>
> Reducing Trie to SquishedDawg
>
>
>> === Moving lstmf files for training data ===
>
> Moving /tmp/eng-2019-02-04.pCA/eng.Arial.exp0.lstmf to 
>> /home/kh/tesstutorial/engtrain
>
> Moving /tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0.lstmf to 
>> /home/kh/tesstutorial/engtrain
>
>
>> Created starter traineddata for language 'eng'
>
>
>>
>> Run lstmtraining to do the LSTM training for language 'eng'
>
>
>>
>> kh@DSAD-6 /usr/share/tessdata
>
> $ tesstrain.sh --fonts_dir /usr/share/fonts --fontlist "Impact Condensed" 
>> --lang eng --linedata_only --noextract_font_properties --langdata_dir 
>> ~/langdata/ --tessdata_dir ./ --output_dir ~/tesstutorial/engeval
>
>
>> === Starting training for language 'eng'
>
> [Mon, Feb 4, 2019 1:21:10 PM] /usr/bin/text2image 
>> --fonts_dir=/usr/share/fonts --font=Impact Condensed 
>> --outputbase=/tmp/font_tmp.e96rRhOoQ5/sample_text.txt 
>> --text=/tmp/font_tmp.e96rRhOoQ5/sample_text.txt 
>> --fontconfig_tmpdir=/tmp/font_tmp.e96rRhOoQ5
>
> Rendered page 0 to file /tmp/font_tmp.e96rRhOoQ5/sample_text.txt.tif
>
>
>> === Phase I: Generating training images ===
>
> Rendering using Impact Condensed
>
> [Mon, Feb 4, 2019 1:21:14 PM] /usr/bin/text2image 
>> --fontconfig_tmpdir=/tmp/font_tmp.e96rRhOoQ5 --fonts_dir=/usr/share/fonts 
>> --strip_unrenderable_words --leading=32 --xsize=3600 --char_spacing=0.0 
>> --exposure=0 --outputbase=/tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0 
>> --max_pages=0 --font=Impact Condensed 
>> --text=/home/kh/langdata//eng/eng.training_text
>
> Rendered page 0 to file 
>> /tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0.tif
>
> Rendered page 1 to file 
>> /tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0.tif
>
>
>> === Phase UP: Generating unicharset and unichar properties files ===
>
> [Mon, Feb 4, 2019 1:21:16 PM] /usr/bin/unicharset_extractor 
>> --output_unicharset /tmp/eng-2019-02-04.TL6/eng.unicharset --norm_mode 1 
>> /tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0.box
>
> Extracting unicharset from box file 
>> /tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0.box
>
> Other case É of é is not in unicharset
>
> Wrote unicharset file /tmp/eng-2019-02-04.TL6/eng.unicharset
>
> [Mon, Feb 4, 2019 1:21:17 PM] /usr/bin/set_unicharset_properties -U 
>> /tmp/eng-2019-02-04.TL6/eng.unicharset -O 
>> /tmp/eng-2019-02-04.TL6/eng.unicharset -X 
>> /tmp/eng-2019-02-04.TL6/eng.xheights --script_dir=/home/kh/langdata/
>
> Loaded unicharset of size 111 from file 
>> /tmp/eng-2019-02-04.TL6/eng.unicharset
>
> Setting unichar properties
>
> Other case É of é is not in unicharset
>
> Setting script properties
>
> Warning: properties incomplete for index 25 = ~
>
> Writing unicharset to file /tmp/eng-2019-02-04.TL6/eng.unicharset
>
>
>> === Phase E: Generating lstmf files ===
>
> Using TESSDATA_PREFIX=./
>
> [Mon, Feb 4, 2019 1:21:17 PM] /usr/local/bin/tesseract 
>> /tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0.tif 
>> /tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0 --psm 6 lstm.train
>
> Tesseract Open Source OCR Engine v4.0.0 with Leptonica
>
> Page 1
>
> Page 2
>
> Loaded 49/49 pages (1-49) of document 
>> /tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0.lstmf
>
>
>> === Constructing LSTM training data ===
>
> [Mon, Feb 4, 2019 1:21:19 PM] /usr/bin/combine_lang_model 
>> --input_unicharset /tmp/eng-2019-02-04.TL6/eng.unicharset --script_dir 
>> /home/kh/langdata/ --words /home/kh/langdata//eng/eng.wordlist --numbers 
>> /home/kh/langdata//eng/eng.numbers --puncs /home/kh/langdata//eng/eng.punc 
>> --output_dir /home/kh/tesstutorial/engeval --lang eng
>
> Loaded unicharset of size 111 from file 
>> /tmp/eng-2019-02-04.TL6/eng.unicharset
>
> Setting unichar properties
>
> Other case É of é is not in unicharset
>
> Setting script properties
>
> Config file is optional, continuing...
>
> Failed to read data from: /home/kh/langdata//eng/eng.config
>
> Null char=2
>
> Reducing Trie to SquishedDawg
>
> Reducing Trie to SquishedDawg
>
> Reducing Trie to SquishedDawg
>
>
>> === Moving lstmf files for training data ===
>
> Moving /tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0.lstmf to 
>> /home/kh/tesstutorial/engeval
>
>
>> Created starter traineddata for language 'eng'
>
>
>>
>> Run lstmtraining to do the LSTM training for language 'eng'
>
>
>>
>> kh@DSAD-6 /usr/share/tessdata
>
> $ combine_tessdata -e ./eng.traineddata  ~/tesstutorial/engoutput/eng.lstm
>
> Extracting tessdata components from ./eng.traineddata
>
> Wrote /home/kh/tesstutorial/engoutput/eng.lstm
>
> Version string:4.00.00alpha:eng:synth20170629
>
> 17:lstm:size=401636, offset=192
>
> 18:lstm-punc-dawg:size=4322, offset=401828
>
> 19:lstm-word-dawg:size=3694794, offset=406150
>
> 20:lstm-number-dawg:size=4738, offset=4100944
>
> 21:lstm-unicharset:size=6360, offset=4105682
>
> 22:lstm-recoder:size=1012, offset=4112042
>
> 23:version:size=30, offset=4113054
>
>
>> kh@DSAD-6 /usr/share/tessdata
>
> $ lstmtraining --model_output ~/tesstutorial/engoutput/impact 
>> --continue_from ~/tesstutorial/engoutput/eng.lstm --traineddata 
>> ~/tesstutorial/engtrain/eng/eng.traineddata --old_traineddata 
>> ./eng.traineddata --max_iterations 3600 -train_listfile 
>> ~/tesstutorial/engtrain/eng.training_files.txt
>
> Loaded file /home/kh/tesstutorial/engoutput/eng.lstm, unpacking...
>
> Warning: LSTMTrainer deserialized an LSTMRecognizer!
>
> Code range changed from 111 to 110!
>
> Num (Extended) outputs,weights in Series:
>
>   1,36,0,1:1, 0
>
> Num (Extended) outputs,weights in Series:
>
>   C3,3:9, 0
>
>   Ft16:16, 160
>
> Total weights = 160
>
>   [C3,3Ft16]:16, 160
>
>   Mp3,3:16, 0
>
>   Lfys48:48, 12480
>
>   Lfx96:96, 55680
>
>   Lrx96:96, 74112
>
>   Lfx192:192, 221952
>
>   Fc110:110, 0
>
> Total weights = 364384
>
> Previous null char=110 mapped to 109
>
> Continuing from /home/kh/tesstutorial/engoutput/eng.lstm
>
> Loaded 72/72 pages (1-72) of document 
>> /home/kh/tesstutorial/engtrain/eng.Arial.exp0.lstmf
>
> Loaded 72/72 pages (1-72) of document 
>> /home/kh/tesstutorial/engtrain/eng.Impact_Condensed.exp0.lstmf
>
> !int_mode_:Error:Assert failed:in file 
>> /cygdrive/d/cyg_pub/devel/tesseract/tesseract-ocr-4.0.0-1.x86_64/src/tesseract-4.0.0/src/lstm/weightmatrix.cpp,
>>  
>> line 249
>
> !int_mode_:Error:Assert failed:in file 
>> /cygdrive/d/cyg_pub/devel/tesseract/tesseract-ocr-4.0.0-1.x86_64/src/tesseract-4.0.0/src/lstm/weightmatrix.cpp,
>>  
>> line 249
>
> !int_mode_:Error:Assert failed:in file 
>> /cygdrive/d/cyg_pub/devel/tesseract/tesseract-ocr-4.0.0-1.x86_64/src/tesseract-4.0.0/src/lstm/weightmatrix.cpp,
>>  
>> line 249
>
> !int_mode_:Error:Assert failed:in file 
>> /cygdrive/d/cyg_pub/devel/tesseract/tesseract-ocr-4.0.0-1.x86_64/src/tesseract-4.0.0/src/lstm/weightmatrix.cpp,
>>  
>> line 249
>
>
>


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/635612c6-2e9f-4034-9bad-f80eb044b298%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Assert failed:in file weightmatrix.cpp, line 249

Reply via email to