decimal point is not a problem, I can devide by 100 or 10 and it works :)

could you share my the whole code ? thanks

Le lundi 27 juin 2022 à 20:44:42 UTC+2, zdenop a écrit :

> not sure what are you doing, but try something like this:
>
> def autoinvert(binarized_img, tresh=0.5):
>     """Invert binarized image if amount of black pixels is higher than 
> tresh.
>     """
>     height, width = binarized_img.shape
>     non_zero = cv2.countNonZero(binarized_img)
>     white_rate = non_zero/(height*width)
>     if  white_rate < tresh:
>         return ~binarized_img
>     else:
>         return binarized_img
>
> filename = 'default.png'
> test = cv2.imread(filename, cv2.IMREAD_GRAYSCALE)
> binarized = cv2.threshold(test, 0, 255, cv2.THRESH_BINARY + 
> cv2.THRESH_OTSU)[1]
> kernel = np.ones((5,5), np.uint8)
> img_erosion = cv2.dilate(autoinvert(binarized), kernel, iterations=1)
> ratio = round(40/img_erosion.shape[0], 2)
> ocr_image = cv2.resize(img_erosion, (0,0), fx=ratio, fy=ratio)
>
> output = pytesseract.image_to_string(ocr_image,
>                             config=f'--tessdata-dir "{tessdata}" --psm 6')
> print(output)
>
> Which produces '733 124', so there is still a problem with the decimal 
> point...
>
> Zdenko
>
>
> po 27. 6. 2022 o 13:00 Hervé <herve....@gmail.com> napísal(a):
>
>> Hi
>>
>> I don't achieve to have a 300dpi image, I tried with increasing picam 
>> resolution, I only have 96. I tried with 
>>
>> img = cv2.resize(img, None, fx=1.5, fy=1.5, interpolation=cv2.INTER_AREA) 
>>
>> but it only grows the image size, not the DPI.
>>
>> Thanks
>>
>>
>> Le dimanche 26 juin 2022 à 15:24:01 UTC+2, zdenop a écrit :
>>
>>> Check your tesseract version (tesseract -v). Here is mine:
>>>
>>> tesseract 5.1.0-70-g0df5
>>>  leptonica-1.83.0 (Jun 24 2022, 17:48:50) [MSC v.1929 LIB Release x64]
>>>   libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.0.91) : libpng 1.6.37 : 
>>> libtiff 4.4.0 : zlib 1.2.12 : libwebp 1.2.2 : libopenjp2 2.5.0
>>>  Found AVX2
>>>  Found AVX
>>>  Found FMA
>>>  Found SSE4.1
>>>  Found libarchive 3.5.1 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 
>>> libzstd/1.4.9
>>>  Found libcurl/7.75.0 zlib/1.2.12 libssh2/1.10.1_DEV
>>>
>>>
>>> + try to use (eng) data file from tessdata_best[1] (also just 
>>> tessdata[2] produce a result)
>>>
>>> Regarding image: 
>>>
>>>    1. I took output from your code "cv2.imwrite('pH.jpg', ph)" (jpg is 
>>>    not good format for ocr)
>>>    2. I opened it as grayscale and I see 2 problems covered by 
>>>    documentation:
>>>       - it needs to be inverted
>>>       - it needs to be resized to the height of letters is between 
>>>       30-40 points.
>>>    3. I guess sharpening (to increase space between dot and 3) 
>>>    would help to recognize dot.
>>>    4. Binarize/threshold image by yourself. Tesseract has some binarize 
>>>    algorithms, but you can another one that better fit your case.
>>>
>>> I suggest doing image preprocessing in the image editor (to check what 
>>> helps) and then implementing it into code.
>>>
>>> [1] https://github.com/tesseract-ocr/tessdata_best
>>> [2] https://github.com/tesseract-ocr/tessdata
>>>
>>> Zdenko
>>>
>>>
>>> ne 26. 6. 2022 o 0:23 Hervé <herve....@gmail.com> napísal(a):
>>>
>>>> Sorry I am really noob
>>>>
>>>> When I do : tesseract pH_treshr.png -
>>>> I have :
>>>> Empty page!!
>>>> Empty page!!
>>>>
>>>> How do you achieve to have this image ? and why can't I tesseract it 
>>>> like you ? I am on buster with tesseract 5.1
>>>>
>>>> is there a way to discuss ? discord ? 
>>>>
>>>> thanks for your patience and help
>>>>
>>>> Le samedi 25 juin 2022 à 14:34:06 UTC+2, zdenop a écrit :
>>>>
>>>>> Sorry - I mean Rescaling:
>>>>>
>>>>> Tesseract works best on images which have a DPI of at least 300 dpi, 
>>>>> so it may be beneficial to resize images. For more information see the 
>>>>> FAQ.
>>>>> "Willus Dotkom" made interesting test for Optimal image resolution 
>>>>> with suggestion for optimal Height of capital letter in pixels:
>>>>> https://groups.google.com/g/tesseract-ocr/c/Wdh_JJwnw94/m/24JHDYQbBQAJ
>>>>>
>>>>>
>>>>> After that, you can get output (but the dot is missing) with the 
>>>>> command line: "tesseract pH_treshr.png -"
>>>>>
>>>>> I was able to get the decimal point separator with the letsgodigital 
>>>>> data file 
>>>>> https://github.com/arturaugusto/display_ocr/blob/master/letsgodigital/letsgodigital.traineddata
>>>>> tesseract pH_treshr.png - -l letsgodigital
>>>>>
>>>>> Or  have a look at SSD https://github.com/Shreeshrii/tessdata_ssd
>>>>>
>>>>> Zdenko
>>>>>
>>>>>
>>>>> so 25. 6. 2022 o 12:17 Hervé <herve....@gmail.com> napísal(a):
>>>>>
>>>>>> I am on tesseract 5
>>>>>>
>>>>>> Inverting images 
>>>>>>
>>>>>> While tesseract version 3.05 (and older) handle inverted image (dark 
>>>>>> background and light text) without problem, for 4.x version use dark 
>>>>>> text 
>>>>>> on light background.
>>>>>> isn'it the same than : 
>>>>>>     (thresh, im_bw) = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY 
>>>>>> | cv2.THRESH_OTSU)
>>>>>>     im_bw = cv2.bitwise_not(im_bw)
>>>>>>
>>>>>> for resizing, I take my picture in full HD, do increasing resolution 
>>>>>> will allow tesseract to better OCR ?
>>>>>>
>>>>>> thanks
>>>>>>
>>>>>>
>>>>>> Le samedi 25 juin 2022 à 11:25:50 UTC+2, zdenop a écrit :
>>>>>>
>>>>>>> Why you did not try more relevant hits like inverting and resizing?
>>>>>>>
>>>>>>> Zdenko
>>>>>>>
>>>>>>>
>>>>>>> so 25. 6. 2022 o 10:56 Hervé <herve....@gmail.com> napísal(a):
>>>>>>>
>>>>>>>> I tried gray image, black and white, and I use 
>>>>>>>>
>>>>>>>>  custom_psm = r'--psm 7'
>>>>>>>>
>>>>>>>> didn't try others parameters
>>>>>>>> Le samedi 25 juin 2022 à 10:32:14 UTC+2, zdenop a écrit :
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> so 25. 6. 2022 o 8:15 Hervé <herve....@gmail.com> napísal(a):
>>>>>>>>>
>>>>>>>>>> Hi
>>>>>>>>>> I just tried some, without real success
>>>>>>>>>>
>>>>>>>>>> Please be specific: what did you try and what was the result?
>>>>>>>>>
>>>>>>>>>  
>>>>>>>>>
>>>>>>>>>> could I learn digits from pictures ? maybe this font is not well 
>>>>>>>>>> recognized
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Any training is useless if the failure is at the image 
>>>>>>>>> preprocessing stage.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> thanks
>>>>>>>>>>
>>>>>>>>>> Le vendredi 24 juin 2022 à 17:12:44 UTC+2, zdenop a écrit :
>>>>>>>>>>
>>>>>>>>>>> Did try to implement suggestion from documentation?
>>>>>>>>>>>
>>>>>>>>>>> https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Zdenko
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> pi 24. 6. 2022 o 16:59 Hervé <herve....@gmail.com> napísal(a):
>>>>>>>>>>>
>>>>>>>>>>>> Hi, I need some help to make tesseract-OCR recognize digits : 
>>>>>>>>>>>> can't achieve to make this work with
>>>>>>>>>>>>
>>>>>>>>>>>>  
>>>>>>>>>>>> https://img.super-h.fr/images/2022/06/24/9a03414616bc4c6bd6e4bdb78e9d6783.jpg
>>>>>>>>>>>>  
>>>>>>>>>>>>
>>>>>>>>>>>> here is my code : 
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> import cv2
>>>>>>>>>>>> import pytesseract
>>>>>>>>>>>>
>>>>>>>>>>>> pytesseract.pytesseract.tesseract_cmd ="C:\\Program 
>>>>>>>>>>>> Files\\Tesseract-OCR\\tesseract.exe"
>>>>>>>>>>>>
>>>>>>>>>>>> def process_image(img):
>>>>>>>>>>>>     #cv2.imshow('Img',img)
>>>>>>>>>>>>     #cv2.waitKey(0)
>>>>>>>>>>>>
>>>>>>>>>>>>     ### passage en niveau de gris
>>>>>>>>>>>>     gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
>>>>>>>>>>>>     #cv2.imshow('Img',gray)
>>>>>>>>>>>>     #v2.waitKey(0)
>>>>>>>>>>>>
>>>>>>>>>>>>     ###analyse de l'image
>>>>>>>>>>>>     valeur = pytesseract.image_to_string(gray)
>>>>>>>>>>>>     print(valeur)
>>>>>>>>>>>>
>>>>>>>>>>>>     ##passage en noir et blanc
>>>>>>>>>>>>     (thresh, im_bw) = cv2.threshold(gray, 128, 255, 
>>>>>>>>>>>> cv2.THRESH_BINARY | cv2.THRESH_OTSU)
>>>>>>>>>>>>     im_bw = cv2.bitwise_not(im_bw)
>>>>>>>>>>>>     #cv2.imshow('Img',im_bw)
>>>>>>>>>>>>     #cv2.waitKey(0)
>>>>>>>>>>>>     # cv2.imwrite('ph.png',im_bw)
>>>>>>>>>>>>     print(pytesseract.image_to_string(im_bw))
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ###ouverture de l'image
>>>>>>>>>>>> img = cv2.imread('ocr5.png')
>>>>>>>>>>>> # cv2.imshow('Img',imgcoupee)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ###on rogne
>>>>>>>>>>>> imgcoupee = img[1056:1517,950:1862]
>>>>>>>>>>>> #img = cv2.imwrite('ocrcoupee.png',imgcoupee)
>>>>>>>>>>>> # cv2.imshow('Img',imgcoupee)
>>>>>>>>>>>>
>>>>>>>>>>>> ### decoupage de la partie correspondant au PH
>>>>>>>>>>>> ph= img[516:625, 616:815]
>>>>>>>>>>>>
>>>>>>>>>>>> #cv2.imwrite('pH.jpg', image_pH)
>>>>>>>>>>>>
>>>>>>>>>>>> ### partie chlore
>>>>>>>>>>>> cl = img[516:625, 882:1056]
>>>>>>>>>>>>
>>>>>>>>>>>> ### partie dÃ:copyright:faut flow
>>>>>>>>>>>> #flow= img[1302:1398,1054:1400]
>>>>>>>>>>>>
>>>>>>>>>>>> ### process
>>>>>>>>>>>> #process_image(imgcoupee)
>>>>>>>>>>>> process_image(ph)
>>>>>>>>>>>> process_image(cl)
>>>>>>>>>>>> #process_image(flow)
>>>>>>>>>>>>
>>>>>>>>>>>> digits seems to be clear enough, but it does'nt work, if 
>>>>>>>>>>>> someone could help me ?
>>>>>>>>>>>>
>>>>>>>>>>>> thanks !
>>>>>>>>>>>>
>>>>>>>>>>>> -- 
>>>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>>>> Google Groups "tesseract-ocr" group.
>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from 
>>>>>>>>>>>> it, send an email to tesseract-oc...@googlegroups.com.
>>>>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/a05712a5-e6ed-411f-a072-e389ea7095efn%40googlegroups.com
>>>>>>>>>>>>  
>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/a05712a5-e6ed-411f-a072-e389ea7095efn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>> .
>>>>>>>>>>>>
>>>>>>>>>>> -- 
>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>> Google Groups "tesseract-ocr" group.
>>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>>>> send an email to tesseract-oc...@googlegroups.com.
>>>>>>>>>>
>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/4ed81a73-0a82-426e-a35e-ba52c5ac71f1n%40googlegroups.com
>>>>>>>>>>  
>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/4ed81a73-0a82-426e-a35e-ba52c5ac71f1n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>> .
>>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>> send an email to tesseract-oc...@googlegroups.com.
>>>>>>>>
>>>>>>> To view this discussion on the web visit 
>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/eb2f2bdd-843d-4f11-83bb-d96e578ad94en%40googlegroups.com
>>>>>>>>  
>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/eb2f2bdd-843d-4f11-83bb-d96e578ad94en%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>> .
>>>>>>>>
>>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to tesseract-oc...@googlegroups.com.
>>>>>>
>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/68c4cc25-811d-41dd-b93a-b0df17d9b705n%40googlegroups.com
>>>>>>  
>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/68c4cc25-811d-41dd-b93a-b0df17d9b705n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to tesseract-oc...@googlegroups.com.
>>>>
>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/f76dbe5d-d75d-4ef8-90c3-d36ae3898194n%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/f76dbe5d-d75d-4ef8-90c3-d36ae3898194n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com.
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/93e684ff-c519-4966-906b-ed6b376ee11en%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/93e684ff-c519-4966-906b-ed6b376ee11en%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/4f74ab5a-4305-4d57-9154-e0bdda7dfb1an%40googlegroups.com.

Reply via email to