[tesseract-ocr] could not update eng_custom.traineddata properly

Mitya Tue, 18 Mar 2025 22:49:15 -0700

*Description:*
I have some text, which is single word on tiff file , designed to train 
eng_custom.traineddata 
Currently I use syntax below which seem sane and does not produce any error 
before last step


*Important*:
I don't want to change [1] as I my goal to train each of 1000 tiff files 
with same parameters, since I prepared corresponding tessRead and boxes for 
each tiff.

[1]
tesseract test_sample.tiff test_sample \
  --tessdata-dir /home/j/img2/tess_files \
  --psm 7 --oem 1 -l eng_custom \
  /home/j/tesseract/tessdata/configs/lstm.train
  
  echo "test_sample.lstmf" > single_lstmf_file.txt

  [2]
  # Train LSTM model
lstmtraining \
  --model_output tess_training.lstm \
  --continue_from /home/j/img2/tess_files/eng.lstm \
  --traineddata /home/j/img2/tess_files/eng_custom.traineddata \
  --train_listfile single_lstmf_file.txt \
  --max_iterations 1
  

  # Stop training and finalize model
lstmtraining --stop_training \
  --continue_from tess_training.lstm_checkpoint \
  --traineddata /home/j/img2/tess_files/eng_custom.traineddata \
  --model_output /home/j/img2/tess_files/eng_final.lstm




  # Update traineddata with new LSTM model
mkdir -p /home/j/img2/base_model  
combine_tessdata -u /home/j/img2/tess_files/eng_custom.traineddata 
/home/j/img2/base_model/eng_custom  
cp /home/j/img2/tess_files/eng_final.lstm /home/j/img2/base_model/eng.lstm 
combine_tessdata /home/j/img2/base_model/eng_custom 
cp /home/j/img2/base_model/eng_custom.traineddata 
/home/j/img2/tess_files/eng_custom.traineddata

*  But I have problem after final step:*

  j@j:~/t$ tesseract test_sample.tiff stdout -l eng_custom --tessdata-dir 
/home/j/img2/tess_files/
index >= 0:Error:Assert failed:in file 
/home/j/tesseract4/src/ccutil/strngs.cpp, line 266
Aborted (core dumped)

*Question:*
how to amend above commands so I can combine eng_final.lstm with 
eng_custom.traineddata 

*environment:*

/home/j/img2/tess_files/
eng.traineddata
eng_custom.traineddata
eng.lstm
eng_final.lstm


/home/j/img2/base_model/

eng_custom.bigram-dawg       eng_custom.normproto      eng_custom.word-dawg
eng_custom.freq-dawg         eng_custom.number-dawg    eng.lstm
eng_custom.inttemp           eng_custom.pffmtable      eng.lstm-number-dawg
eng_custom.lstm              eng_custom.punc-dawg      eng.lstm-punc-dawg
eng_custom.lstm-number-dawg  eng_custom.shapetable     eng.lstm-recoder
eng_custom.lstm-punc-dawg    eng_custom.traineddata    eng.lstm-unicharset
eng_custom.lstm-recoder      eng_custom.unicharambigs  eng.lstm-word-dawg
eng_custom.lstm-unicharset   eng_custom.unicharset     eng.version
eng_custom.lstm-word-dawg    eng_custom.version



*Kindly Advise*
Mitya


  
  

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/tesseract-ocr/e995c0f7-f38b-4592-af57-5847e0dd027dn%40googlegroups.com.

[tesseract-ocr] could not update eng_custom.traineddata properly

Reply via email to