Greetings, I'm using tesseract 4.0.0 in a C/C++ application where I capture an image and then "scrape" text/data from it. I am having issues with tesseract recognizing the ROI with just several characters ( see attached).
The attached image is: *014* Recognized as: */~—6h014 5* If I get rid of extra space around the number it gets better but the problem is sometimes the string of characters is outside the ROI so I have to increase the size to get all of them. I've tried using OpenCV to grayscale, blur and resize which has seemed to help a little. I've also tried all the PSM modes. The other thing that is puzzling is that from the command line it works great. Maybe this is due to the image being saved as a jpg first before the OCR is done. Inside the application it's raw data. Any thoughts? Ed Tesseract Version: tesseract 4.0.0-beta.1 leptonica-1.75.3 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0 -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/0febca85-373e-46f1-9b41-9eb524d2d690n%40googlegroups.com.