
Following the tutorial "Training From Scratch", use langdata_lstm and 
I got an error "Segmentation fault" when I executed tesstrain.sh.

Error log:
=== Phase E: Generating lstmf files ===
Loaded 89754/89754 lines (1-89754) of document 
tesseract/src/training/tesstrain_utils.sh: line 73: 3787663 Segmentation 
fault      (core dumped) "${cmd}" "$@" 2>&1
     3787664 Done                    | tee -a "${LOG_FILE}"
ERROR: Program tesseract failed. Abort.

There are three questions about this error.
1. Is tessdata_best/lang.traineddata trained by langdata_lstm and 
2. How could I reproduce tessdata_best/lang.traineddata?
3. If training_text is too large, how could I avoid this error?

Thank you in advance!

Ubuntu 20.04
tesseract 4.1.1
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : libtiff 
4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.1
 Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.8 liblz4/1.9.2 

CONFIDENTIALITY NOTICE:  This email together with any attachments is 
confidential. It is intended for the recipient(s) named and purpose stated 
above only. If you are not the named recipient(s), or have received this 
message in error, please do not disclose the contents to anyone, and 
immediately notify the sender by e-mail and delete this e-mail message 
together with all attached documentation from your computer. TPIsoftware 
does not accept liability for the abnormal integrity and accuracy of the 
communication, computer virus, data corruption, interference or delay 
arising from or in respect of the email communication. Thank you.


You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 

Reply via email to