Hello. I am trying to recognize the last 4 digits of credit cards in 
pictures of receipts. Usually, these have 16 asterisks with the last 4 
digits afterwards with no spaces. I have included an example here without 
showing all 4 digits of the credit card for security, but showing 2 so you 
can see that the numbers are showing up reasonably well. This is cropped 
from a larger receipt.

The problem is that the output from tesseract for this line is 
KKKKKKKEKEKERBQIGL. I thought I could get around this by specifying 
`--user-patterns`. I created a file `eng.user-patterns` with the contents 
`\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\d\d\d\d`. I also tried 
`****************\d\d\d\d` because I am not sure if I have to escape * or 
only \. I ran this with `tesseract image.jpg output.txt -l eng 
--user-patterns eng.user-patterns`, but the output does not seem to be 
affected. That line is still the same gibberish. I tried user words with 
the exact last 4 digits I am looking for, but same result. I am using 
tesseract 4.1.1. 

Is there anything I can try besides retraining? It seems like not such a 
hard case.

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/25249612-ff65-4849-9f21-3ed3d5936f66n%40googlegroups.com.

Reply via email to