I've trained 8000 samples with set of commands below:
echo "~/source/source.lstmf" > /home/j/trainingCurrentEng/data/list.eval echo "~/source/source.lstmf" > /home/j/trainingCurrentEng/data/list.train lstmtraining --continue_from /home/j/trainingCurrentEng/data/checkpoints/eng_trained_checkpoint --traineddata /home/j/trainingCurrentEng/data/eng.traineddata --traineddata /home/j/trainingCurrentEng/data/eng.traineddata --train_listfile /home/j/trainingCurrentEng/data/list.train --eval_listfile /home/j/trainingCurrentEng/data/list.eval --model_output /home/j/trainingCurrentEng/data/checkpoints/eng_trained --learning_rate 0.001 --debug_interval 10 --max_iterations 8000000 lstmtraining --stop_training --continue_from /home/j/trainingCurrentEng/data/checkpoints/eng_trained_checkpoint --traineddata /home/j/trainingCurrentEng/data/eng.traineddata --model_output /home/j/trainingCurrentEng/data/eng_trained.traineddata *IMPORTANT/RESULT:* tesseract source.tiff output_text -l eng --tessdata-dir /home/j/trainingCurrentEng/data --psm 7 cat output_text.txt *abcdef* tesseract source.tiff output_text_1 -l eng_trained --tessdata-dir /home/j/trainingCurrentEng/data --psm 7 cat output_text_1.txt *laldlfk* *Question:* Syntax one looks better, but after 8000 results I got Tesseract eng_trained model distorted, so it reads completely wrong But If you read THE LAST sample trained/updated eng_trained, it reads this exact data flawlessly What am I doing wrong? How to fix current syntax? *IMPORTANT*: I use images in same color pallete: black background white(close to gray) font, without any masks applied. j@j-Aspire-A515-58M:~/source$ ls source.box source.lstmf source.tiff source.txt unicharset Source Box: a 251 52 355 178 0 b 356 51 444 176 0 c 446 22 530 175 0 d 534 22 622 173 0 e 626 60 766 174 0 f 768 59 870 173 0 source.txt *abcdef* unicharset 9 NULL 0 Common 0 Joined 7 0,255,0,255,0,0,0,0,0,0 Latin 1 0 1 Joined # Joined [4a 6f 69 6e 65 64 ]a |Broken|0|1 15 0,255,0,255,0,0,0,0,0,0 Common 2 10 2 |Broken|0|1 # Broken a 3 0,255,0,255,0,0,0,0,0,0 Latin 3 0 3 a # a [6f ]a b 3 0,255,0,255,0,0,0,0,0,0 Latin 4 0 4 b # b [65 ]a c 3 0,255,0,255,0,0,0,0,0,0 Latin 5 0 5 c # c [64 ]a d 3 0,255,0,255,0,0,0,0,0,0 Latin 6 0 6 d # d [6b ]a e 3 0,255,0,255,0,0,0,0,0,0 Latin 7 0 7 e # e [6d ]a f 3 0,255,0,255,0,0,0,0,0,0 Latin 8 0 8 f # f [63 ]a -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/63ee7679-8c44-4fbc-be44-63bdb5920e7an%40googlegroups.com.