Hello. I am trying to recognize the last 4 digits of credit cards in pictures of receipts. Usually, these have 16 asterisks with the last 4 digits afterwards with no spaces. I have included an example here without showing all 4 digits of the credit card for security, but showing 2 so you can see that the numbers are showing up reasonably well. This is cropped from a larger receipt.
The problem is that the output from tesseract for this line is KKKKKKKEKEKERBQIGL. I thought I could get around this by specifying `--user-patterns`. I created a file `eng.user-patterns` with the contents `\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\d\d\d\d`. I also tried `****************\d\d\d\d` because I am not sure if I have to escape * or only \. I ran this with `tesseract image.jpg output.txt -l eng --user-patterns eng.user-patterns`, but the output does not seem to be affected. That line is still the same gibberish. I tried user words with the exact last 4 digits I am looking for, but same result. I am using tesseract 4.1.1. Is there anything I can try besides retraining? It seems like not such a hard case. Thanks -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/25249612-ff65-4849-9f21-3ed3d5936f66n%40googlegroups.com.