Hello,

Following the tutorial "Training From Scratch", use langdata_lstm and 
tesstrain.sh.
I got an error "Segmentation fault" when I executed tesstrain.sh.

Error log:
=== Phase E: Generating lstmf files ===
Loaded 89754/89754 lines (1-89754) of document 
/tmp/chi_tra-2021-09-09.CGU/chi_tra.AR_PL_UKai_TW.exp0.lstmf
tesseract/src/training/tesstrain_utils.sh: line 73: 3787663 Segmentation 
fault      (core dumped) "${cmd}" "$@" 2>&1
     3787664 Done                    | tee -a "${LOG_FILE}"
ERROR: Program tesseract failed. Abort.

There are three questions about this error.
1. Is tessdata_best/lang.traineddata trained by langdata_lstm and 
tesstrain.sh? 
2. How could I reproduce tessdata_best/lang.traineddata?
3. If training_text is too large, how could I avoid this error?

Thank you in advance!

Environment:
Ubuntu 20.04
tesseract 4.1.1
 leptonica-1.79.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : libtiff 
4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.1
 Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.8 liblz4/1.9.2 
libzstd/1.4.4

-- 
CONFIDENTIALITY NOTICE:  This email together with any attachments is 
confidential. It is intended for the recipient(s) named and purpose stated 
above only. If you are not the named recipient(s), or have received this 
message in error, please do not disclose the contents to anyone, and 
immediately notify the sender by e-mail and delete this e-mail message 
together with all attached documentation from your computer. TPIsoftware 
does not accept liability for the abnormal integrity and accuracy of the 
communication, computer virus, data corruption, interference or delay 
arising from or in respect of the email communication. Thank you.


本信件及附件內容為機密性資料,若您並非被指定之收件人或在任何未經授權的情形之下收到本信件,請勿揭曉本信件內容於任何人,並請立即告知原發信人,以及請從您的電腦刪除此信件和任何附件。昕力資訊不承擔本信件所產生或含電腦病毒、數據毀損、干擾或遲延,而導致通信完整性與正確性異常之責任。謝謝。

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/65d9389f-774c-4701-8649-3173060f510cn%40googlegroups.com.

Reply via email to