Re: [tesseract-ocr] Fail to differentiate capital letter I from number 1.

Shahin Majazi Mon, 22 Aug 2022 07:03:36 -0700

By using the following preprocessing methods, the output of tesseract will
be better in the {I 02, I 03, I 04, I 05, I 06, I 08, I 10, I 11, I 12},
but not in other images.



1. Grayscale Image: img_gray = cv2.imread(img_path, 0)
2. Erosion: img_eroded = cv2.erode(img_gray, np.ones((4, 4), np.uint8),
iteration=1)
3. Rescaling: rescaling_img = cv2.resize(img_eroded, None, fx=3, fy=3,
interpolation=cv2.INTER_CUBIC)

‫‪Lisa Ki‬‏ <‪[email protected]‬‏> در تاریخ شنبه ۲۰ اوت ۲۰۲۲
ساعت ۱۰:۵۰ نوشت:‬

> Hi guys, I am trying to extract text from some simple clips and it just
> keeps reading capital I into number 1. Does anyone have any suggestions?
>
> I have only added borders to the original images as code below:
>
> i = Image.open(ifp).convert('RGB')
> colour = [255, 255, 255]
> top, bottom, left, right = [150]*4
> i_with_border = cv2.copyMakeBorder(np.array(i), top, bottom, left, right,
> cv2.BORDER_CONSTANT, value=colour)
> ocr_result = pytesseract.image_to_string(i_with_border)
>
> results:
> 101.
>
> 102.
>
> 103.
>
> 104.
>
> 105.
>
> 106.
>
> 107.
>
> 108.
>
> 109.
>
> 110.
>
> I'11.
>
> 112.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/59710fba-c1f8-43b7-ba93-7ad84f9318f2n%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/59710fba-c1f8-43b7-ba93-7ad84f9318f2n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAPzZBjxB6pO2oiF%3Dvn5pwfgPWV3bvWKFUVm4ktmCdG91tUdyQg%40mail.gmail.com.

Re: [tesseract-ocr] Fail to differentiate capital letter I from number 1.

Reply via email to