[tesseract-ocr] Setting User_Patterns does not seem to effect output. (Alternatively, are there any ways to limit tesseract to output strings that are a specific length?)

David Orshan Mon, 02 Mar 2015 09:33:46 -0800

Hi,

I want to have tesseract recognize images that I know contain a single word 
that is 8 characters long.


I found a few mentions of user_patterns here: 
http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html#_config_files_and_augmenting_with_user_data,
 
which seems to be the solution I need, so I tried following the 
instructions, but I can't seem to get the file to effect my output. As a 
sanity check, I tried setting user_patterns to only contain a string of 
"\d\d\d\d\d\d\d\d", which I thought should cause an output of only numbers, 
but there is no effect (i'm getting outputs that are 4 characters long and 
only letters). I also tried changing the 
language_model_penalty_non_dict_word to 1.0 in an attempt to force 
tesseract to accept my user-defined dictionary, but that also didn't work.

Does anybody have any idea what I could be doing wrong? Alternatively, is 
there any other way to limit tesseract to strings that are a certain length?

Thanks for the help

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/2a52f59d-b524-426e-841b-3571929b2d97%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Setting User_Patterns does not seem to effect output. (Alternatively, are there any ways to limit tesseract to output strings that are a specific length?)

Reply via email to