Hi,
Tesseract seems to post process its prediction. Here after, what I get after OCRizing images (same font, same size images generated with text2image): - an image containing "12345678I" => `123456781` - an image containing "GLOTHUVFI" => `GLOTHUVFI` - an image containing "12345678H" => `12345678H` - an image containing "GLOTHUVFH" => `GLOTHUVFH` - an image containing "12345678A" => `123456784` - an image containing "GLOTHUVFA" => `GLOTHUVFA` It looks like Tesseract doesn't like a word with a some numbers and one letter at the end. In fact, if the letter looks like a number ("I" and "A" looks like "1" and "4" respectively), it replaces it by the closest number. I have tried to tune following parameters without any changement in the result: - segment_penalty_dict_frequent_word - language_model_penalty_chartype Thanks for any help. Regards -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/4722674d-27a1-4b8e-8c5a-9e07dbe3ca7d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.