I am totally puzzled with how the confidence reported at Word level relates to the confidences assigned to the characters of the same word.
I used the attached TIFF image to recognize a simple MICR line of a check. The recognized text had two words: 495096 700000b01b205xX0eL00007010717 The confidence percentiles for the words were 59% and 38% respectively. The confidence percentiles for the characters of the first word were (rounded): 4 97% 9 99% 5 100% 0 100% 9 99% 6 96% I would like to know how with such high confidence scores for individual characters, one can compute the word level confidence at 59%. I ran this test using fast training data for English with no training of my own. I am not worried about the accuracy, just curious about how to interpret confidence scores. Thanks! -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/1a83aa4d-5961-4265-9871-1bcac85e73e8%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.