;-)

> tesseract screenNum_014.jpg -
Estimating resolution as 903
014

> tesseract -v
tesseract 5.1.0
 leptonica-1.83.0 (Jan 26 2022, 19:15:03) [MSC v.1929 LIB Release x64]
  libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.0.91) : libpng 1.6.37 :
libtiff 4.3.0 : zlib 1.2.11
 Found AVX2
 Found AVX
 Found FMA
 Found SSE4.1
 Found OpenMP 2019
 Found libarchive 3.5.1 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 libzstd/1.4.9
 Found libcurl/7.75.0 zlib/1.2.11 libssh2/1.10.1_DEV


Zdenko


ut 1. 3. 2022 o 18:22 Ed Dow <eddow1...@gmail.com> napísal(a):

> Greetings,
>
> I found a potential solution to rewrite each pixel to either white or
> black based on a set threshold. After looking at OpenCV functions I found
> "threshold" would do just that but Tesseract was still finding "ghost"
> characters in the white areas of the image.  So I had to find where the
> string starts and grab an ROI from that point.  Note that the
> THRESH_BINARY_INV parameter to threshold will also convert dark colors to
> white and light colors to black.  From things I've read Tesseract likes
> black characters on white backgrounds.
>
> So the solution I came up with is the following using OpenCV and tesseract:
>
>     Mat img;  // should already have the image
>     Mat cropped;
>     Mat grayed;
>     Mat inverted;
>     Mat cropNum;
>
>     // Crop the original image to the defined ROI
>     Rect roi(xStart,yStart,xMove,yMove);
>     cropped = img(roi);
>
>    // Convert Image to Gray
>     cvtColor(cropped, grayed, COLOR_BGR2GRAY);
>
>     // Invert Image to black and white
>     threshold(grayed,  inverted, 100, 255, THRESH_BINARY_INV);
>
>     // Use tesseract to OCR
>     tesseract::TessBaseAPI *ocr = new tesseract::TessBaseAPI();
>     ocr->Init(NULL, "eng", tesseract::OEM_LSTM_ONLY);
>
>     ocr->SetPageSegMode(tesseract::PSM_SINGLE_WORD);
>
>     ocr->SetImage( inverted .data,  inverted .cols,  inverted .rows, 1,
> inverted .step);
>
>     popupNum = string(ocr->GetUTF8Text());
>
>
>     NOTE: Be careful with the 4th parameter in ocr->SetImage  function.
> This is the number of bits per pixel.
>                 After converting to grayscale it's 1 and not 3.   I forgot
> about this and I was getting 3 strings back.  Quite strange.
>
>
>
>
> On Thursday, February 24, 2022 at 11:02:27 PM UTC-7 Ed Dow wrote:
>
>> Greetings,
>>
>> I'm using tesseract 4.0.0 in a C/C++ application where I capture an image
>> and then "scrape" text/data from it.  I am having issues with tesseract
>> recognizing the ROI with just several characters ( see attached).
>>
>> The attached image is:  *014*
>> Recognized as:  */~—6h014 5*
>>
>> If I get rid of extra space around the number it gets better but the
>> problem is sometimes the string of characters is outside the ROI so I have
>> to increase the size to get all of them.
>>
>> I've tried using OpenCV to grayscale, blur and resize which has seemed to
>> help a little.  I've also tried all the PSM modes.
>>
>> The other thing that is puzzling is that from the command line it works
>> great.  Maybe this is due to the image being saved as a jpg first before
>> the OCR is done.  Inside the application it's raw data.
>>
>> Any thoughts?
>> Ed
>>
>>
>> Tesseract Version:
>>
>> tesseract 4.0.0-beta.1
>>  leptonica-1.75.3
>>   libgif 5.1.4 :
>>   libjpeg 8d (libjpeg-turbo 1.5.2) :
>>    libpng 1.6.34 :
>>   libtiff 4.0.9 :
>>   zlib 1.2.11 :
>>   libwebp 0.6.1 :
>>   libopenjp2 2.3.0
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/bae3383d-84ee-402c-aa2f-af4fe7273a4fn%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/bae3383d-84ee-402c-aa2f-af4fe7273a4fn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8x62Y7D64QVY_GtAOVNmiCYxKLLmJCvT37xdo2fMp4wzw%40mail.gmail.com.

Reply via email to