[tesseract-ocr] tesseract pattern not enforced?

MAtteo Acquarone Thu, 17 Sep 2020 00:29:40 -0700

 

Hello,


I'm using Tesseract 4.1.0.0 trying to OCR a text field on the target that 
contains codes that have a pattern ( implemented as pattern file in 
Tesseract terms):
P\n\n\n\n
C\n\n\n\n
B\n\n\n\n
U\n\n\n\n

In practice there is a letter that can be P or C, or B or U and then 4 more 
hex digits.
The length is always exactly 5 char in total.

So, at least in my intention with this pattern file, correct output would 
be, as examples:
P0123, P2EFD, C12EF, B2BCD and so on.
Running the script that does OCR thousands of times I see that the vast 
majority of the output is as expected but I have also some results like 
PPB, PFF3,CC3 and so on.
Is there a way I can enforce more the adherence to the pattern I setup like 
this:
user_patterns_file=C:\Util\Code_OCR.Pattern
tessedit_char_whitelist=PCBU0123456789ABCDEF
tessedit_char_blacklist=abcdefGgHhIiLlMmNnOopQqRrSsTtuVvZzJjYyKkWw-!|
load_system_dawg=F
load_freq_dawg=F

Thanks in advance.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/9fe54d32-b055-4237-8749-9f697429de06n%40googlegroups.com.

[tesseract-ocr] tesseract pattern not enforced?

Reply via email to