I am working with Tesseract OCR and want to experiment with different binarization methods, such as Otsu's thresholding and other custom filters, to improve text recognition accuracy.
However, I am concerned that training with these different preprocessing techniques might modify or overwrite eng.traineddata, which I want to keep intact. *My questions are:* Does training a new model affect the existing eng.traineddata file? How can I safely train Tesseract with new filters without modifying the default English model? Is there a recommended approach to train Tesseract on preprocessed images while keeping eng.traineddata unchanged? *What I've tried:* updated my current eng_new.traineddata with three samples, each sample had applied filter Otsu, Otsu_Tresh_Binary, Otsu_Tresh_Binary_Inv After first 1000 iterations I got difference between initial and target trained.data But target trained.data got slightly worse results. lstmtraining --continue_from /home/j/trainingCurrentEng/data/checkpoints/eng_trained --traineddata /home/j/trainingCurrentEng/data/eng.traineddata --train_listfile /home/j/trainingCurrentEng/data/list.train --eval_listfile /home/j/trainingCurrentEng/data/list.eval --model_output /home/j/trainingCurrentEng/data/checkpoints/eng_trained --learning_rate 0.0001 --debug_interval 10 --max_iterations 600 tesseract otsu_tresh_binary_inv.tiff output_text -l eng --tessdata-dir /home/j/trainingCurrentEng/data --psm 7 cat output_text.txt Abcd123 tesseract otsu_tresh_binary_inv.tiff output_text_1 -l eng_trained --tessdata-dir /home/j/trainingCurrentEng/data --psm 7 cat output_text_1.txt Abc I would appreciate any guidance or best practices for training custom models without interfering with existing ones. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/78672437-a384-4d9a-b24a-7e9167aa285bn%40googlegroups.com.