Hi, *I have some errors when I follow this tutorial to retrain tesseract: *
I follow this link to retrain tesseract with my image dataset (I retrain tesseract with real image, not from text file via tesstrain.sh) https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#creating-starter-traineddata It is my steps to retrain tesseract lstm: *Step1: I create my training data (tif image + box file) from my images.* I generated its via this command line: tesseract [lang].[fontname].exp[num].tif [lang].[fontname].exp[num] batch.nochop makebox *Step2: I edit manually by Qt-box-edito*r. (I done with this link: https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-%E2%80%93-Make-Box-Files ) So now I have files: .tif file .box file .lstmf file (generated by command: tesseract [lang].[fontname].exp[num].tif [lang].[fontname].exp[num] lstm.train unicharset file *Step 3: I create .traineddata via this command:* combine_lang_model --input_unicharset unicharset --script_dir langdata --output_dir output --lang "eng" With langdata I downloaded from here: https://github.com/tesseract-ocr/langdata *Step4: I extract existing model from exist traineddata by command:* combine_tessdata -e /usr/share/tesseract-ocr/4.00/tessdata/eng.traineddata eng.lstm *Step5: I retrain tesseract *(Fine Tuning for ± a few characters: https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for--a-few-characters) by command: lstmtraining --model_output output_model --continue_from eng.lstm --traineddata output_basic/eng/eng.traineddata --old_traineddata /usr/share tesseract-ocr/4.00/tessdata/eng.traineddata --train_listfile eng.training_files.txt --debug_interval -1 --max_iterations 400 - It is format of my eng.training_files.txt: path/to/lstmf *I get an error like the following:* [image: Screenshot from 2018-10-19 21-49-00.png] *It is example about my training image:* [image: eng.centurygothic.exp0.png] <about:invalid#zClosurez> *I try to retrain tesseract with from real image (not from text file via tesstrain.sh)* Please share me something if you have any idea to fix it. Thank you for advance ! -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/486c4972-e0e1-4ccc-a59f-0f1dd9eb55b8%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.