not sure what are you doing, but try something like this:

def autoinvert(binarized_img, tresh=0.5):
    """Invert binarized image if amount of black pixels is higher than
tresh.
    """
    height, width = binarized_img.shape
    non_zero = cv2.countNonZero(binarized_img)
    white_rate = non_zero/(height*width)
    if  white_rate < tresh:
        return ~binarized_img
    else:
        return binarized_img

filename = 'default.png'
test = cv2.imread(filename, cv2.IMREAD_GRAYSCALE)
binarized = cv2.threshold(test, 0, 255, cv2.THRESH_BINARY +
cv2.THRESH_OTSU)[1]
kernel = np.ones((5,5), np.uint8)
img_erosion = cv2.dilate(autoinvert(binarized), kernel, iterations=1)
ratio = round(40/img_erosion.shape[0], 2)
ocr_image = cv2.resize(img_erosion, (0,0), fx=ratio, fy=ratio)

output = pytesseract.image_to_string(ocr_image,
                            config=f'--tessdata-dir "{tessdata}" --psm 6')
print(output)

Which produces '733 124', so there is still a problem with the decimal
point...

Zdenko


po 27. 6. 2022 o 13:00 Hervé <herve.hey...@gmail.com> napísal(a):

> Hi
>
> I don't achieve to have a 300dpi image, I tried with increasing picam
> resolution, I only have 96. I tried with
>
> img = cv2.resize(img, None, fx=1.5, fy=1.5, interpolation=cv2.INTER_AREA)
>
> but it only grows the image size, not the DPI.
>
> Thanks
>
>
> Le dimanche 26 juin 2022 à 15:24:01 UTC+2, zdenop a écrit :
>
>> Check your tesseract version (tesseract -v). Here is mine:
>>
>> tesseract 5.1.0-70-g0df5
>>  leptonica-1.83.0 (Jun 24 2022, 17:48:50) [MSC v.1929 LIB Release x64]
>>   libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.0.91) : libpng 1.6.37 :
>> libtiff 4.4.0 : zlib 1.2.12 : libwebp 1.2.2 : libopenjp2 2.5.0
>>  Found AVX2
>>  Found AVX
>>  Found FMA
>>  Found SSE4.1
>>  Found libarchive 3.5.1 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6
>> libzstd/1.4.9
>>  Found libcurl/7.75.0 zlib/1.2.12 libssh2/1.10.1_DEV
>>
>>
>> + try to use (eng) data file from tessdata_best[1] (also just tessdata[2]
>> produce a result)
>>
>> Regarding image:
>>
>>    1. I took output from your code "cv2.imwrite('pH.jpg', ph)" (jpg is
>>    not good format for ocr)
>>    2. I opened it as grayscale and I see 2 problems covered by
>>    documentation:
>>       - it needs to be inverted
>>       - it needs to be resized to the height of letters is between 30-40
>>       points.
>>    3. I guess sharpening (to increase space between dot and 3)
>>    would help to recognize dot.
>>    4. Binarize/threshold image by yourself. Tesseract has some binarize
>>    algorithms, but you can another one that better fit your case.
>>
>> I suggest doing image preprocessing in the image editor (to check what
>> helps) and then implementing it into code.
>>
>> [1] https://github.com/tesseract-ocr/tessdata_best
>> [2] https://github.com/tesseract-ocr/tessdata
>>
>> Zdenko
>>
>>
>> ne 26. 6. 2022 o 0:23 Hervé <herve....@gmail.com> napísal(a):
>>
>>> Sorry I am really noob
>>>
>>> When I do : tesseract pH_treshr.png -
>>> I have :
>>> Empty page!!
>>> Empty page!!
>>>
>>> How do you achieve to have this image ? and why can't I tesseract it
>>> like you ? I am on buster with tesseract 5.1
>>>
>>> is there a way to discuss ? discord ?
>>>
>>> thanks for your patience and help
>>>
>>> Le samedi 25 juin 2022 à 14:34:06 UTC+2, zdenop a écrit :
>>>
>>>> Sorry - I mean Rescaling:
>>>>
>>>> Tesseract works best on images which have a DPI of at least 300 dpi, so
>>>> it may be beneficial to resize images. For more information see the FAQ.
>>>> "Willus Dotkom" made interesting test for Optimal image resolution with
>>>> suggestion for optimal Height of capital letter in pixels:
>>>> https://groups.google.com/g/tesseract-ocr/c/Wdh_JJwnw94/m/24JHDYQbBQAJ
>>>>
>>>>
>>>> After that, you can get output (but the dot is missing) with the
>>>> command line: "tesseract pH_treshr.png -"
>>>>
>>>> I was able to get the decimal point separator with the letsgodigital
>>>> data file
>>>> https://github.com/arturaugusto/display_ocr/blob/master/letsgodigital/letsgodigital.traineddata
>>>> tesseract pH_treshr.png - -l letsgodigital
>>>>
>>>> Or  have a look at SSD https://github.com/Shreeshrii/tessdata_ssd
>>>>
>>>> Zdenko
>>>>
>>>>
>>>> so 25. 6. 2022 o 12:17 Hervé <herve....@gmail.com> napísal(a):
>>>>
>>>>> I am on tesseract 5
>>>>>
>>>>> Inverting images
>>>>>
>>>>> While tesseract version 3.05 (and older) handle inverted image (dark
>>>>> background and light text) without problem, for 4.x version use dark text
>>>>> on light background.
>>>>> isn'it the same than :
>>>>>     (thresh, im_bw) = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY
>>>>> | cv2.THRESH_OTSU)
>>>>>     im_bw = cv2.bitwise_not(im_bw)
>>>>>
>>>>> for resizing, I take my picture in full HD, do increasing resolution
>>>>> will allow tesseract to better OCR ?
>>>>>
>>>>> thanks
>>>>>
>>>>>
>>>>> Le samedi 25 juin 2022 à 11:25:50 UTC+2, zdenop a écrit :
>>>>>
>>>>>> Why you did not try more relevant hits like inverting and resizing?
>>>>>>
>>>>>> Zdenko
>>>>>>
>>>>>>
>>>>>> so 25. 6. 2022 o 10:56 Hervé <herve....@gmail.com> napísal(a):
>>>>>>
>>>>>>> I tried gray image, black and white, and I use
>>>>>>>
>>>>>>>  custom_psm = r'--psm 7'
>>>>>>>
>>>>>>> didn't try others parameters
>>>>>>> Le samedi 25 juin 2022 à 10:32:14 UTC+2, zdenop a écrit :
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> so 25. 6. 2022 o 8:15 Hervé <herve....@gmail.com> napísal(a):
>>>>>>>>
>>>>>>>>> Hi
>>>>>>>>> I just tried some, without real success
>>>>>>>>>
>>>>>>>>> Please be specific: what did you try and what was the result?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> could I learn digits from pictures ? maybe this font is not well
>>>>>>>>> recognized
>>>>>>>>>
>>>>>>>>
>>>>>>>> Any training is useless if the failure is at the image
>>>>>>>> preprocessing stage.
>>>>>>>>
>>>>>>>>
>>>>>>>>> thanks
>>>>>>>>>
>>>>>>>>> Le vendredi 24 juin 2022 à 17:12:44 UTC+2, zdenop a écrit :
>>>>>>>>>
>>>>>>>>>> Did try to implement suggestion from documentation?
>>>>>>>>>>
>>>>>>>>>> https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Zdenko
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> pi 24. 6. 2022 o 16:59 Hervé <herve....@gmail.com> napísal(a):
>>>>>>>>>>
>>>>>>>>>>> Hi, I need some help to make tesseract-OCR recognize digits :
>>>>>>>>>>> can't achieve to make this work with
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> https://img.super-h.fr/images/2022/06/24/9a03414616bc4c6bd6e4bdb78e9d6783.jpg
>>>>>>>>>>>
>>>>>>>>>>> here is my code :
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> import cv2
>>>>>>>>>>> import pytesseract
>>>>>>>>>>>
>>>>>>>>>>> pytesseract.pytesseract.tesseract_cmd ="C:\\Program
>>>>>>>>>>> Files\\Tesseract-OCR\\tesseract.exe"
>>>>>>>>>>>
>>>>>>>>>>> def process_image(img):
>>>>>>>>>>>     #cv2.imshow('Img',img)
>>>>>>>>>>>     #cv2.waitKey(0)
>>>>>>>>>>>
>>>>>>>>>>>     ### passage en niveau de gris
>>>>>>>>>>>     gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
>>>>>>>>>>>     #cv2.imshow('Img',gray)
>>>>>>>>>>>     #v2.waitKey(0)
>>>>>>>>>>>
>>>>>>>>>>>     ###analyse de l'image
>>>>>>>>>>>     valeur = pytesseract.image_to_string(gray)
>>>>>>>>>>>     print(valeur)
>>>>>>>>>>>
>>>>>>>>>>>     ##passage en noir et blanc
>>>>>>>>>>>     (thresh, im_bw) = cv2.threshold(gray, 128, 255,
>>>>>>>>>>> cv2.THRESH_BINARY | cv2.THRESH_OTSU)
>>>>>>>>>>>     im_bw = cv2.bitwise_not(im_bw)
>>>>>>>>>>>     #cv2.imshow('Img',im_bw)
>>>>>>>>>>>     #cv2.waitKey(0)
>>>>>>>>>>>     # cv2.imwrite('ph.png',im_bw)
>>>>>>>>>>>     print(pytesseract.image_to_string(im_bw))
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ###ouverture de l'image
>>>>>>>>>>> img = cv2.imread('ocr5.png')
>>>>>>>>>>> # cv2.imshow('Img',imgcoupee)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ###on rogne
>>>>>>>>>>> imgcoupee = img[1056:1517,950:1862]
>>>>>>>>>>> #img = cv2.imwrite('ocrcoupee.png',imgcoupee)
>>>>>>>>>>> # cv2.imshow('Img',imgcoupee)
>>>>>>>>>>>
>>>>>>>>>>> ### decoupage de la partie correspondant au PH
>>>>>>>>>>> ph= img[516:625, 616:815]
>>>>>>>>>>>
>>>>>>>>>>> #cv2.imwrite('pH.jpg', image_pH)
>>>>>>>>>>>
>>>>>>>>>>> ### partie chlore
>>>>>>>>>>> cl = img[516:625, 882:1056]
>>>>>>>>>>>
>>>>>>>>>>> ### partie dÃ:copyright:faut flow
>>>>>>>>>>> #flow= img[1302:1398,1054:1400]
>>>>>>>>>>>
>>>>>>>>>>> ### process
>>>>>>>>>>> #process_image(imgcoupee)
>>>>>>>>>>> process_image(ph)
>>>>>>>>>>> process_image(cl)
>>>>>>>>>>> #process_image(flow)
>>>>>>>>>>>
>>>>>>>>>>> digits seems to be clear enough, but it does'nt work, if someone
>>>>>>>>>>> could help me ?
>>>>>>>>>>>
>>>>>>>>>>> thanks !
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>>> Google Groups "tesseract-ocr" group.
>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from
>>>>>>>>>>> it, send an email to tesseract-oc...@googlegroups.com.
>>>>>>>>>>> To view this discussion on the web visit
>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/a05712a5-e6ed-411f-a072-e389ea7095efn%40googlegroups.com
>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/a05712a5-e6ed-411f-a072-e389ea7095efn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>> .
>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>> send an email to tesseract-oc...@googlegroups.com.
>>>>>>>>>
>>>>>>>> To view this discussion on the web visit
>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/4ed81a73-0a82-426e-a35e-ba52c5ac71f1n%40googlegroups.com
>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/4ed81a73-0a82-426e-a35e-ba52c5ac71f1n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>> .
>>>>>>>>>
>>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "tesseract-ocr" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to tesseract-oc...@googlegroups.com.
>>>>>>>
>>>>>> To view this discussion on the web visit
>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/eb2f2bdd-843d-4f11-83bb-d96e578ad94en%40googlegroups.com
>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/eb2f2bdd-843d-4f11-83bb-d96e578ad94en%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to tesseract-oc...@googlegroups.com.
>>>>>
>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/68c4cc25-811d-41dd-b93a-b0df17d9b705n%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/68c4cc25-811d-41dd-b93a-b0df17d9b705n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@googlegroups.com.
>>>
>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/f76dbe5d-d75d-4ef8-90c3-d36ae3898194n%40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/f76dbe5d-d75d-4ef8-90c3-d36ae3898194n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/93e684ff-c519-4966-906b-ed6b376ee11en%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/93e684ff-c519-4966-906b-ed6b376ee11en%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8yq2hHw%2Bcg_LRNZSQ8n-ddUEMKTvKy8DFuxBno-xtpaUg%40mail.gmail.com.

Reply via email to