Check the unicharset file to see if all the characters you want to recognize are there.
combine_tessdata -u trained_model.traineddata output_dir cat output_dir/*unicharset Otherwise you need to merge the old one with the new one before training. This is how ocrd-train <https://github.com/OCR-D/ocrd-train> does it (you could try to use it BTW). combine_tessdata -u $(TESSDATA)/$(CONTINUE_FROM).traineddata $(TESSDATA)/$(CONTINUE_FROM). unicharset_extractor --output_unicharset "$(TRAIN)/my.unicharset" --norm_mode $(NORM_MODE) "$(ALL_BOXES)" merge_unicharsets $(TESSDATA)/$(CONTINUE_FROM).lstm-unicharset $(TRAIN)/my.unicharset merged.unicharset my.unicharset is the new one, something.lstm-unicharset is the old one, NORM_MODE = 2, ALL_BOXES is a file with all the box files names. And then something like this: combine_tessdata -o continue_from.traineddata merged.unicharset It's probably the same thing that Qt-box-editor does. I never tried this, I use ocrd that does things ib a little different way. At the very beginning of the training lstmtraining will print if the set of characters is different from the previous model. Bye Lorenzo Il giorno sab 27 ott 2018 alle ore 08:04 tu tonquang <tonquangt...@gmail.com> ha scritto: > It's similiar with my problem. It well recognized for special characters > (new data trained) but wrongly recognize for normal characters and word. > > Vào 11:29 T.7, 27 Th10 2018 Sreehari B S <sreeharib...@gmail.com> đã viết: > >> Hi, >> >> Something similar happened when finetuned for :. When doing ice, it >> recognized some : as 1. So I fine-tuned the same. >> >> Now when I ocr : , it works well. When I ice some real data it's now >> worser than the previous one. >> >> * I trained on best eng.traineddata >> * I created boxes using tesseract make box command and this was edited >> using jTessBoxEditor. But the box dimensions were not so perfect. >> Note : I trained from a real image. (Do I really need to edit the >> coordinates by hand to adjust the dimensions ?) >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-ocr+unsubscr...@googlegroups.com. >> To post to this group, send email to tesseract-ocr@googlegroups.com. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/f89a3852-3d89-477f-ad58-6cf2cea12aab%40googlegroups.com >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/CAH1O8a9-M4dMtZj0k6CgHnQU_bO88mmLWqZUCFm5iDGRjK1_gw%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAH1O8a9-M4dMtZj0k6CgHnQU_bO88mmLWqZUCFm5iDGRjK1_gw%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLzRzNzh%2BJPXtiwkXKJhOvkMZvy2Di_ZWJoc07bLYtWdWg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.