;-) > tesseract screenNum_014.jpg - Estimating resolution as 903 014
> tesseract -v tesseract 5.1.0 leptonica-1.83.0 (Jan 26 2022, 19:15:03) [MSC v.1929 LIB Release x64] libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.0.91) : libpng 1.6.37 : libtiff 4.3.0 : zlib 1.2.11 Found AVX2 Found AVX Found FMA Found SSE4.1 Found OpenMP 2019 Found libarchive 3.5.1 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 libzstd/1.4.9 Found libcurl/7.75.0 zlib/1.2.11 libssh2/1.10.1_DEV Zdenko ut 1. 3. 2022 o 18:22 Ed Dow <eddow1...@gmail.com> napísal(a): > Greetings, > > I found a potential solution to rewrite each pixel to either white or > black based on a set threshold. After looking at OpenCV functions I found > "threshold" would do just that but Tesseract was still finding "ghost" > characters in the white areas of the image. So I had to find where the > string starts and grab an ROI from that point. Note that the > THRESH_BINARY_INV parameter to threshold will also convert dark colors to > white and light colors to black. From things I've read Tesseract likes > black characters on white backgrounds. > > So the solution I came up with is the following using OpenCV and tesseract: > > Mat img; // should already have the image > Mat cropped; > Mat grayed; > Mat inverted; > Mat cropNum; > > // Crop the original image to the defined ROI > Rect roi(xStart,yStart,xMove,yMove); > cropped = img(roi); > > // Convert Image to Gray > cvtColor(cropped, grayed, COLOR_BGR2GRAY); > > // Invert Image to black and white > threshold(grayed, inverted, 100, 255, THRESH_BINARY_INV); > > // Use tesseract to OCR > tesseract::TessBaseAPI *ocr = new tesseract::TessBaseAPI(); > ocr->Init(NULL, "eng", tesseract::OEM_LSTM_ONLY); > > ocr->SetPageSegMode(tesseract::PSM_SINGLE_WORD); > > ocr->SetImage( inverted .data, inverted .cols, inverted .rows, 1, > inverted .step); > > popupNum = string(ocr->GetUTF8Text()); > > > NOTE: Be careful with the 4th parameter in ocr->SetImage function. > This is the number of bits per pixel. > After converting to grayscale it's 1 and not 3. I forgot > about this and I was getting 3 strings back. Quite strange. > > > > > On Thursday, February 24, 2022 at 11:02:27 PM UTC-7 Ed Dow wrote: > >> Greetings, >> >> I'm using tesseract 4.0.0 in a C/C++ application where I capture an image >> and then "scrape" text/data from it. I am having issues with tesseract >> recognizing the ROI with just several characters ( see attached). >> >> The attached image is: *014* >> Recognized as: */~—6h014 5* >> >> If I get rid of extra space around the number it gets better but the >> problem is sometimes the string of characters is outside the ROI so I have >> to increase the size to get all of them. >> >> I've tried using OpenCV to grayscale, blur and resize which has seemed to >> help a little. I've also tried all the PSM modes. >> >> The other thing that is puzzling is that from the command line it works >> great. Maybe this is due to the image being saved as a jpg first before >> the OCR is done. Inside the application it's raw data. >> >> Any thoughts? >> Ed >> >> >> Tesseract Version: >> >> tesseract 4.0.0-beta.1 >> leptonica-1.75.3 >> libgif 5.1.4 : >> libjpeg 8d (libjpeg-turbo 1.5.2) : >> libpng 1.6.34 : >> libtiff 4.0.9 : >> zlib 1.2.11 : >> libwebp 0.6.1 : >> libopenjp2 2.3.0 >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/bae3383d-84ee-402c-aa2f-af4fe7273a4fn%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/bae3383d-84ee-402c-aa2f-af4fe7273a4fn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8x62Y7D64QVY_GtAOVNmiCYxKLLmJCvT37xdo2fMp4wzw%40mail.gmail.com.