Hi, I want to have tesseract recognize images that I know contain a single word that is 8 characters long.
I found a few mentions of user_patterns here: http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html#_config_files_and_augmenting_with_user_data, which seems to be the solution I need, so I tried following the instructions, but I can't seem to get the file to effect my output. As a sanity check, I tried setting user_patterns to only contain a string of "\d\d\d\d\d\d\d\d", which I thought should cause an output of only numbers, but there is no effect (i'm getting outputs that are 4 characters long and only letters). I also tried changing the language_model_penalty_non_dict_word to 1.0 in an attempt to force tesseract to accept my user-defined dictionary, but that also didn't work. Does anybody have any idea what I could be doing wrong? Alternatively, is there any other way to limit tesseract to strings that are a certain length? Thanks for the help -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2a52f59d-b524-426e-841b-3571929b2d97%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

