not sure what are you doing, but try something like this: def autoinvert(binarized_img, tresh=0.5): """Invert binarized image if amount of black pixels is higher than tresh. """ height, width = binarized_img.shape non_zero = cv2.countNonZero(binarized_img) white_rate = non_zero/(height*width) if white_rate < tresh: return ~binarized_img else: return binarized_img
filename = 'default.png' test = cv2.imread(filename, cv2.IMREAD_GRAYSCALE) binarized = cv2.threshold(test, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1] kernel = np.ones((5,5), np.uint8) img_erosion = cv2.dilate(autoinvert(binarized), kernel, iterations=1) ratio = round(40/img_erosion.shape[0], 2) ocr_image = cv2.resize(img_erosion, (0,0), fx=ratio, fy=ratio) output = pytesseract.image_to_string(ocr_image, config=f'--tessdata-dir "{tessdata}" --psm 6') print(output) Which produces '733 124', so there is still a problem with the decimal point... Zdenko po 27. 6. 2022 o 13:00 Hervé <herve.hey...@gmail.com> napísal(a): > Hi > > I don't achieve to have a 300dpi image, I tried with increasing picam > resolution, I only have 96. I tried with > > img = cv2.resize(img, None, fx=1.5, fy=1.5, interpolation=cv2.INTER_AREA) > > but it only grows the image size, not the DPI. > > Thanks > > > Le dimanche 26 juin 2022 à 15:24:01 UTC+2, zdenop a écrit : > >> Check your tesseract version (tesseract -v). Here is mine: >> >> tesseract 5.1.0-70-g0df5 >> leptonica-1.83.0 (Jun 24 2022, 17:48:50) [MSC v.1929 LIB Release x64] >> libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.0.91) : libpng 1.6.37 : >> libtiff 4.4.0 : zlib 1.2.12 : libwebp 1.2.2 : libopenjp2 2.5.0 >> Found AVX2 >> Found AVX >> Found FMA >> Found SSE4.1 >> Found libarchive 3.5.1 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 >> libzstd/1.4.9 >> Found libcurl/7.75.0 zlib/1.2.12 libssh2/1.10.1_DEV >> >> >> + try to use (eng) data file from tessdata_best[1] (also just tessdata[2] >> produce a result) >> >> Regarding image: >> >> 1. I took output from your code "cv2.imwrite('pH.jpg', ph)" (jpg is >> not good format for ocr) >> 2. I opened it as grayscale and I see 2 problems covered by >> documentation: >> - it needs to be inverted >> - it needs to be resized to the height of letters is between 30-40 >> points. >> 3. I guess sharpening (to increase space between dot and 3) >> would help to recognize dot. >> 4. Binarize/threshold image by yourself. Tesseract has some binarize >> algorithms, but you can another one that better fit your case. >> >> I suggest doing image preprocessing in the image editor (to check what >> helps) and then implementing it into code. >> >> [1] https://github.com/tesseract-ocr/tessdata_best >> [2] https://github.com/tesseract-ocr/tessdata >> >> Zdenko >> >> >> ne 26. 6. 2022 o 0:23 Hervé <herve....@gmail.com> napísal(a): >> >>> Sorry I am really noob >>> >>> When I do : tesseract pH_treshr.png - >>> I have : >>> Empty page!! >>> Empty page!! >>> >>> How do you achieve to have this image ? and why can't I tesseract it >>> like you ? I am on buster with tesseract 5.1 >>> >>> is there a way to discuss ? discord ? >>> >>> thanks for your patience and help >>> >>> Le samedi 25 juin 2022 à 14:34:06 UTC+2, zdenop a écrit : >>> >>>> Sorry - I mean Rescaling: >>>> >>>> Tesseract works best on images which have a DPI of at least 300 dpi, so >>>> it may be beneficial to resize images. For more information see the FAQ. >>>> "Willus Dotkom" made interesting test for Optimal image resolution with >>>> suggestion for optimal Height of capital letter in pixels: >>>> https://groups.google.com/g/tesseract-ocr/c/Wdh_JJwnw94/m/24JHDYQbBQAJ >>>> >>>> >>>> After that, you can get output (but the dot is missing) with the >>>> command line: "tesseract pH_treshr.png -" >>>> >>>> I was able to get the decimal point separator with the letsgodigital >>>> data file >>>> https://github.com/arturaugusto/display_ocr/blob/master/letsgodigital/letsgodigital.traineddata >>>> tesseract pH_treshr.png - -l letsgodigital >>>> >>>> Or have a look at SSD https://github.com/Shreeshrii/tessdata_ssd >>>> >>>> Zdenko >>>> >>>> >>>> so 25. 6. 2022 o 12:17 Hervé <herve....@gmail.com> napísal(a): >>>> >>>>> I am on tesseract 5 >>>>> >>>>> Inverting images >>>>> >>>>> While tesseract version 3.05 (and older) handle inverted image (dark >>>>> background and light text) without problem, for 4.x version use dark text >>>>> on light background. >>>>> isn'it the same than : >>>>> (thresh, im_bw) = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY >>>>> | cv2.THRESH_OTSU) >>>>> im_bw = cv2.bitwise_not(im_bw) >>>>> >>>>> for resizing, I take my picture in full HD, do increasing resolution >>>>> will allow tesseract to better OCR ? >>>>> >>>>> thanks >>>>> >>>>> >>>>> Le samedi 25 juin 2022 à 11:25:50 UTC+2, zdenop a écrit : >>>>> >>>>>> Why you did not try more relevant hits like inverting and resizing? >>>>>> >>>>>> Zdenko >>>>>> >>>>>> >>>>>> so 25. 6. 2022 o 10:56 Hervé <herve....@gmail.com> napísal(a): >>>>>> >>>>>>> I tried gray image, black and white, and I use >>>>>>> >>>>>>> custom_psm = r'--psm 7' >>>>>>> >>>>>>> didn't try others parameters >>>>>>> Le samedi 25 juin 2022 à 10:32:14 UTC+2, zdenop a écrit : >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> so 25. 6. 2022 o 8:15 Hervé <herve....@gmail.com> napísal(a): >>>>>>>> >>>>>>>>> Hi >>>>>>>>> I just tried some, without real success >>>>>>>>> >>>>>>>>> Please be specific: what did you try and what was the result? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> could I learn digits from pictures ? maybe this font is not well >>>>>>>>> recognized >>>>>>>>> >>>>>>>> >>>>>>>> Any training is useless if the failure is at the image >>>>>>>> preprocessing stage. >>>>>>>> >>>>>>>> >>>>>>>>> thanks >>>>>>>>> >>>>>>>>> Le vendredi 24 juin 2022 à 17:12:44 UTC+2, zdenop a écrit : >>>>>>>>> >>>>>>>>>> Did try to implement suggestion from documentation? >>>>>>>>>> >>>>>>>>>> https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Zdenko >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> pi 24. 6. 2022 o 16:59 Hervé <herve....@gmail.com> napísal(a): >>>>>>>>>> >>>>>>>>>>> Hi, I need some help to make tesseract-OCR recognize digits : >>>>>>>>>>> can't achieve to make this work with >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> https://img.super-h.fr/images/2022/06/24/9a03414616bc4c6bd6e4bdb78e9d6783.jpg >>>>>>>>>>> >>>>>>>>>>> here is my code : >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> import cv2 >>>>>>>>>>> import pytesseract >>>>>>>>>>> >>>>>>>>>>> pytesseract.pytesseract.tesseract_cmd ="C:\\Program >>>>>>>>>>> Files\\Tesseract-OCR\\tesseract.exe" >>>>>>>>>>> >>>>>>>>>>> def process_image(img): >>>>>>>>>>> #cv2.imshow('Img',img) >>>>>>>>>>> #cv2.waitKey(0) >>>>>>>>>>> >>>>>>>>>>> ### passage en niveau de gris >>>>>>>>>>> gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) >>>>>>>>>>> #cv2.imshow('Img',gray) >>>>>>>>>>> #v2.waitKey(0) >>>>>>>>>>> >>>>>>>>>>> ###analyse de l'image >>>>>>>>>>> valeur = pytesseract.image_to_string(gray) >>>>>>>>>>> print(valeur) >>>>>>>>>>> >>>>>>>>>>> ##passage en noir et blanc >>>>>>>>>>> (thresh, im_bw) = cv2.threshold(gray, 128, 255, >>>>>>>>>>> cv2.THRESH_BINARY | cv2.THRESH_OTSU) >>>>>>>>>>> im_bw = cv2.bitwise_not(im_bw) >>>>>>>>>>> #cv2.imshow('Img',im_bw) >>>>>>>>>>> #cv2.waitKey(0) >>>>>>>>>>> # cv2.imwrite('ph.png',im_bw) >>>>>>>>>>> print(pytesseract.image_to_string(im_bw)) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ###ouverture de l'image >>>>>>>>>>> img = cv2.imread('ocr5.png') >>>>>>>>>>> # cv2.imshow('Img',imgcoupee) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ###on rogne >>>>>>>>>>> imgcoupee = img[1056:1517,950:1862] >>>>>>>>>>> #img = cv2.imwrite('ocrcoupee.png',imgcoupee) >>>>>>>>>>> # cv2.imshow('Img',imgcoupee) >>>>>>>>>>> >>>>>>>>>>> ### decoupage de la partie correspondant au PH >>>>>>>>>>> ph= img[516:625, 616:815] >>>>>>>>>>> >>>>>>>>>>> #cv2.imwrite('pH.jpg', image_pH) >>>>>>>>>>> >>>>>>>>>>> ### partie chlore >>>>>>>>>>> cl = img[516:625, 882:1056] >>>>>>>>>>> >>>>>>>>>>> ### partie dÃ:copyright:faut flow >>>>>>>>>>> #flow= img[1302:1398,1054:1400] >>>>>>>>>>> >>>>>>>>>>> ### process >>>>>>>>>>> #process_image(imgcoupee) >>>>>>>>>>> process_image(ph) >>>>>>>>>>> process_image(cl) >>>>>>>>>>> #process_image(flow) >>>>>>>>>>> >>>>>>>>>>> digits seems to be clear enough, but it does'nt work, if someone >>>>>>>>>>> could help me ? >>>>>>>>>>> >>>>>>>>>>> thanks ! >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>> Google Groups "tesseract-ocr" group. >>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>> it, send an email to tesseract-oc...@googlegroups.com. >>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/a05712a5-e6ed-411f-a072-e389ea7095efn%40googlegroups.com >>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/a05712a5-e6ed-411f-a072-e389ea7095efn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>> . >>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "tesseract-ocr" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to tesseract-oc...@googlegroups.com. >>>>>>>>> >>>>>>>> To view this discussion on the web visit >>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/4ed81a73-0a82-426e-a35e-ba52c5ac71f1n%40googlegroups.com >>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/4ed81a73-0a82-426e-a35e-ba52c5ac71f1n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>> . >>>>>>>>> >>>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "tesseract-ocr" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to tesseract-oc...@googlegroups.com. >>>>>>> >>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/eb2f2bdd-843d-4f11-83bb-d96e578ad94en%40googlegroups.com >>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/eb2f2bdd-843d-4f11-83bb-d96e578ad94en%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to tesseract-oc...@googlegroups.com. >>>>> >>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/68c4cc25-811d-41dd-b93a-b0df17d9b705n%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/68c4cc25-811d-41dd-b93a-b0df17d9b705n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com. >>> >> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/f76dbe5d-d75d-4ef8-90c3-d36ae3898194n%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/f76dbe5d-d75d-4ef8-90c3-d36ae3898194n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/93e684ff-c519-4966-906b-ed6b376ee11en%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/93e684ff-c519-4966-906b-ed6b376ee11en%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8yq2hHw%2Bcg_LRNZSQ8n-ddUEMKTvKy8DFuxBno-xtpaUg%40mail.gmail.com.