Im using Cygwin (64, on win10) to compile tesseract and I ran the following commands and got the following error: > > kh@DSAD-6 /usr/share/tessdata > > $ tesstrain.sh --fonts_dir /usr/share/fonts --fontlist "Arial" "Impact >> Condensed" --lang eng --linedata_only --noextract_font_properties >> --langdata_dir ~/langdata/ --tessdata_dir ./ --output_dir >> ~/tesstutorial/engtrain > > >> === Starting training for language 'eng' > > [Mon, Feb 4, 2019 1:17:48 PM] /usr/bin/text2image >> --fonts_dir=/usr/share/fonts --font=Arial >> --outputbase=/tmp/font_tmp.bEkR4qa83g/sample_text.txt >> --text=/tmp/font_tmp.bEkR4qa83g/sample_text.txt >> --fontconfig_tmpdir=/tmp/font_tmp.bEkR4qa83g > > Rendered page 0 to file /tmp/font_tmp.bEkR4qa83g/sample_text.txt.tif > > >> === Phase I: Generating training images === > > Rendering using Arial > > [Mon, Feb 4, 2019 1:17:51 PM] /usr/bin/text2image >> --fontconfig_tmpdir=/tmp/font_tmp.bEkR4qa83g --fonts_dir=/usr/share/fonts >> --strip_unrenderable_words --leading=32 --xsize=3600 --char_spacing=0.0 >> --exposure=0 --outputbase=/tmp/eng-2019-02-04.pCA/eng.Arial.exp0 >> --max_pages=0 --font=Arial --text=/home/kh/langdata//eng/eng.training_text > > Rendering using Impact Condensed > > Rendered page 0 to file /tmp/eng-2019-02-04.pCA/eng.Arial.exp0.tif > > [Mon, Feb 4, 2019 1:17:52 PM] /usr/bin/text2image >> --fontconfig_tmpdir=/tmp/font_tmp.bEkR4qa83g --fonts_dir=/usr/share/fonts >> --strip_unrenderable_words --leading=32 --xsize=3600 --char_spacing=0.0 >> --exposure=0 --outputbase=/tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0 >> --max_pages=0 --font=Impact Condensed >> --text=/home/kh/langdata//eng/eng.training_text > > Rendered page 1 to file /tmp/eng-2019-02-04.pCA/eng.Arial.exp0.tif > > Rendered page 0 to file >> /tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0.tif > > Rendered page 1 to file >> /tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0.tif > > >> === Phase UP: Generating unicharset and unichar properties files === > > [Mon, Feb 4, 2019 1:17:55 PM] /usr/bin/unicharset_extractor >> --output_unicharset /tmp/eng-2019-02-04.pCA/eng.unicharset --norm_mode 1 >> /tmp/eng-2019-02-04.pCA/eng.Arial.exp0.box >> /tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0.box > > Extracting unicharset from box file >> /tmp/eng-2019-02-04.pCA/eng.Arial.exp0.box > > Extracting unicharset from box file >> /tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0.box > > Other case É of é is not in unicharset > > Wrote unicharset file /tmp/eng-2019-02-04.pCA/eng.unicharset > > [Mon, Feb 4, 2019 1:17:55 PM] /usr/bin/set_unicharset_properties -U >> /tmp/eng-2019-02-04.pCA/eng.unicharset -O >> /tmp/eng-2019-02-04.pCA/eng.unicharset -X >> /tmp/eng-2019-02-04.pCA/eng.xheights --script_dir=/home/kh/langdata/ > > Loaded unicharset of size 111 from file >> /tmp/eng-2019-02-04.pCA/eng.unicharset > > Setting unichar properties > > Other case É of é is not in unicharset > > Setting script properties > > Warning: properties incomplete for index 25 = ~ > > Writing unicharset to file /tmp/eng-2019-02-04.pCA/eng.unicharset > > >> === Phase E: Generating lstmf files === > > Using TESSDATA_PREFIX=./ > > [Mon, Feb 4, 2019 1:17:56 PM] /usr/local/bin/tesseract >> /tmp/eng-2019-02-04.pCA/eng.Arial.exp0.tif >> /tmp/eng-2019-02-04.pCA/eng.Arial.exp0 --psm 6 lstm.train > > [Mon, Feb 4, 2019 1:17:56 PM] /usr/local/bin/tesseract >> /tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0.tif >> /tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0 --psm 6 lstm.train > > Tesseract Open Source OCR Engine v4.0.0 with Leptonica > > Page 1 > > Tesseract Open Source OCR Engine v4.0.0 with Leptonica > > Page 1 > > Page 2 > > Page 2 > > Loaded 49/49 pages (1-49) of document >> /tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0.lstmf > > Loaded 52/52 pages (1-52) of document >> /tmp/eng-2019-02-04.pCA/eng.Arial.exp0.lstmf > > >> === Constructing LSTM training data === > > [Mon, Feb 4, 2019 1:17:57 PM] /usr/bin/combine_lang_model >> --input_unicharset /tmp/eng-2019-02-04.pCA/eng.unicharset --script_dir >> /home/kh/langdata/ --words /home/kh/langdata//eng/eng.wordlist --numbers >> /home/kh/langdata//eng/eng.numbers --puncs /home/kh/langdata//eng/eng.punc >> --output_dir /home/kh/tesstutorial/engtrain --lang eng > > Loaded unicharset of size 111 from file >> /tmp/eng-2019-02-04.pCA/eng.unicharset > > Setting unichar properties > > Other case É of é is not in unicharset > > Setting script properties > > Config file is optional, continuing... > > Failed to read data from: /home/kh/langdata//eng/eng.config > > Null char=2 > > Reducing Trie to SquishedDawg > > Reducing Trie to SquishedDawg > > Reducing Trie to SquishedDawg > > >> === Moving lstmf files for training data === > > Moving /tmp/eng-2019-02-04.pCA/eng.Arial.exp0.lstmf to >> /home/kh/tesstutorial/engtrain > > Moving /tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0.lstmf to >> /home/kh/tesstutorial/engtrain > > >> Created starter traineddata for language 'eng' > > >> >> Run lstmtraining to do the LSTM training for language 'eng' > > >> >> kh@DSAD-6 /usr/share/tessdata > > $ tesstrain.sh --fonts_dir /usr/share/fonts --fontlist "Impact Condensed" >> --lang eng --linedata_only --noextract_font_properties --langdata_dir >> ~/langdata/ --tessdata_dir ./ --output_dir ~/tesstutorial/engeval > > >> === Starting training for language 'eng' > > [Mon, Feb 4, 2019 1:21:10 PM] /usr/bin/text2image >> --fonts_dir=/usr/share/fonts --font=Impact Condensed >> --outputbase=/tmp/font_tmp.e96rRhOoQ5/sample_text.txt >> --text=/tmp/font_tmp.e96rRhOoQ5/sample_text.txt >> --fontconfig_tmpdir=/tmp/font_tmp.e96rRhOoQ5 > > Rendered page 0 to file /tmp/font_tmp.e96rRhOoQ5/sample_text.txt.tif > > >> === Phase I: Generating training images === > > Rendering using Impact Condensed > > [Mon, Feb 4, 2019 1:21:14 PM] /usr/bin/text2image >> --fontconfig_tmpdir=/tmp/font_tmp.e96rRhOoQ5 --fonts_dir=/usr/share/fonts >> --strip_unrenderable_words --leading=32 --xsize=3600 --char_spacing=0.0 >> --exposure=0 --outputbase=/tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0 >> --max_pages=0 --font=Impact Condensed >> --text=/home/kh/langdata//eng/eng.training_text > > Rendered page 0 to file >> /tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0.tif > > Rendered page 1 to file >> /tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0.tif > > >> === Phase UP: Generating unicharset and unichar properties files === > > [Mon, Feb 4, 2019 1:21:16 PM] /usr/bin/unicharset_extractor >> --output_unicharset /tmp/eng-2019-02-04.TL6/eng.unicharset --norm_mode 1 >> /tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0.box > > Extracting unicharset from box file >> /tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0.box > > Other case É of é is not in unicharset > > Wrote unicharset file /tmp/eng-2019-02-04.TL6/eng.unicharset > > [Mon, Feb 4, 2019 1:21:17 PM] /usr/bin/set_unicharset_properties -U >> /tmp/eng-2019-02-04.TL6/eng.unicharset -O >> /tmp/eng-2019-02-04.TL6/eng.unicharset -X >> /tmp/eng-2019-02-04.TL6/eng.xheights --script_dir=/home/kh/langdata/ > > Loaded unicharset of size 111 from file >> /tmp/eng-2019-02-04.TL6/eng.unicharset > > Setting unichar properties > > Other case É of é is not in unicharset > > Setting script properties > > Warning: properties incomplete for index 25 = ~ > > Writing unicharset to file /tmp/eng-2019-02-04.TL6/eng.unicharset > > >> === Phase E: Generating lstmf files === > > Using TESSDATA_PREFIX=./ > > [Mon, Feb 4, 2019 1:21:17 PM] /usr/local/bin/tesseract >> /tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0.tif >> /tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0 --psm 6 lstm.train > > Tesseract Open Source OCR Engine v4.0.0 with Leptonica > > Page 1 > > Page 2 > > Loaded 49/49 pages (1-49) of document >> /tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0.lstmf > > >> === Constructing LSTM training data === > > [Mon, Feb 4, 2019 1:21:19 PM] /usr/bin/combine_lang_model >> --input_unicharset /tmp/eng-2019-02-04.TL6/eng.unicharset --script_dir >> /home/kh/langdata/ --words /home/kh/langdata//eng/eng.wordlist --numbers >> /home/kh/langdata//eng/eng.numbers --puncs /home/kh/langdata//eng/eng.punc >> --output_dir /home/kh/tesstutorial/engeval --lang eng > > Loaded unicharset of size 111 from file >> /tmp/eng-2019-02-04.TL6/eng.unicharset > > Setting unichar properties > > Other case É of é is not in unicharset > > Setting script properties > > Config file is optional, continuing... > > Failed to read data from: /home/kh/langdata//eng/eng.config > > Null char=2 > > Reducing Trie to SquishedDawg > > Reducing Trie to SquishedDawg > > Reducing Trie to SquishedDawg > > >> === Moving lstmf files for training data === > > Moving /tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0.lstmf to >> /home/kh/tesstutorial/engeval > > >> Created starter traineddata for language 'eng' > > >> >> Run lstmtraining to do the LSTM training for language 'eng' > > >> >> kh@DSAD-6 /usr/share/tessdata > > $ combine_tessdata -e ./eng.traineddata ~/tesstutorial/engoutput/eng.lstm > > Extracting tessdata components from ./eng.traineddata > > Wrote /home/kh/tesstutorial/engoutput/eng.lstm > > Version string:4.00.00alpha:eng:synth20170629 > > 17:lstm:size=401636, offset=192 > > 18:lstm-punc-dawg:size=4322, offset=401828 > > 19:lstm-word-dawg:size=3694794, offset=406150 > > 20:lstm-number-dawg:size=4738, offset=4100944 > > 21:lstm-unicharset:size=6360, offset=4105682 > > 22:lstm-recoder:size=1012, offset=4112042 > > 23:version:size=30, offset=4113054 > > >> kh@DSAD-6 /usr/share/tessdata > > $ lstmtraining --model_output ~/tesstutorial/engoutput/impact >> --continue_from ~/tesstutorial/engoutput/eng.lstm --traineddata >> ~/tesstutorial/engtrain/eng/eng.traineddata --old_traineddata >> ./eng.traineddata --max_iterations 3600 -train_listfile >> ~/tesstutorial/engtrain/eng.training_files.txt > > Loaded file /home/kh/tesstutorial/engoutput/eng.lstm, unpacking... > > Warning: LSTMTrainer deserialized an LSTMRecognizer! > > Code range changed from 111 to 110! > > Num (Extended) outputs,weights in Series: > > 1,36,0,1:1, 0 > > Num (Extended) outputs,weights in Series: > > C3,3:9, 0 > > Ft16:16, 160 > > Total weights = 160 > > [C3,3Ft16]:16, 160 > > Mp3,3:16, 0 > > Lfys48:48, 12480 > > Lfx96:96, 55680 > > Lrx96:96, 74112 > > Lfx192:192, 221952 > > Fc110:110, 0 > > Total weights = 364384 > > Previous null char=110 mapped to 109 > > Continuing from /home/kh/tesstutorial/engoutput/eng.lstm > > Loaded 72/72 pages (1-72) of document >> /home/kh/tesstutorial/engtrain/eng.Arial.exp0.lstmf > > Loaded 72/72 pages (1-72) of document >> /home/kh/tesstutorial/engtrain/eng.Impact_Condensed.exp0.lstmf > > !int_mode_:Error:Assert failed:in file >> /cygdrive/d/cyg_pub/devel/tesseract/tesseract-ocr-4.0.0-1.x86_64/src/tesseract-4.0.0/src/lstm/weightmatrix.cpp, >> >> line 249 > > !int_mode_:Error:Assert failed:in file >> /cygdrive/d/cyg_pub/devel/tesseract/tesseract-ocr-4.0.0-1.x86_64/src/tesseract-4.0.0/src/lstm/weightmatrix.cpp, >> >> line 249 > > !int_mode_:Error:Assert failed:in file >> /cygdrive/d/cyg_pub/devel/tesseract/tesseract-ocr-4.0.0-1.x86_64/src/tesseract-4.0.0/src/lstm/weightmatrix.cpp, >> >> line 249 > > !int_mode_:Error:Assert failed:in file >> /cygdrive/d/cyg_pub/devel/tesseract/tesseract-ocr-4.0.0-1.x86_64/src/tesseract-4.0.0/src/lstm/weightmatrix.cpp, >> >> line 249 > > >
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/635612c6-2e9f-4034-9bad-f80eb044b298%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

