[tesseract-ocr] Extra characters showing up

Ed Dow Thu, 24 Feb 2022 22:02:22 -0800

Greetings,

I'm using tesseract 4.0.0 in a C/C++ application where I capture an image 
and then "scrape" text/data from it.  I am having issues with tesseract 
recognizing the ROI with just several characters ( see attached).


The attached image is:  *014*
Recognized as:  */~—6h014 5*

If I get rid of extra space around the number it gets better but the 
problem is sometimes the string of characters is outside the ROI so I have 
to increase the size to get all of them.

I've tried using OpenCV to grayscale, blur and resize which has seemed to 
help a little.  I've also tried all the PSM modes.

The other thing that is puzzling is that from the command line it works 
great.  Maybe this is due to the image being saved as a jpg first before 
the OCR is done.  Inside the application it's raw data.

Any thoughts?
Ed


Tesseract Version:

tesseract 4.0.0-beta.1
 leptonica-1.75.3 
  libgif 5.1.4 : 
  libjpeg 8d (libjpeg-turbo 1.5.2) :
   libpng 1.6.34 :
  libtiff 4.0.9 :
  zlib 1.2.11 :
  libwebp 0.6.1 :
  libopenjp2 2.3.0

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/0febca85-373e-46f1-9b41-9eb524d2d690n%40googlegroups.com.

[tesseract-ocr] Extra characters showing up

Reply via email to