Check your tesseract version (tesseract -v). Here is mine:

tesseract 5.1.0-70-g0df5
 leptonica-1.83.0 (Jun 24 2022, 17:48:50) [MSC v.1929 LIB Release x64]
  libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.0.91) : libpng 1.6.37 :
libtiff 4.4.0 : zlib 1.2.12 : libwebp 1.2.2 : libopenjp2 2.5.0
 Found AVX2
 Found AVX
 Found FMA
 Found SSE4.1
 Found libarchive 3.5.1 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 libzstd/1.4.9
 Found libcurl/7.75.0 zlib/1.2.12 libssh2/1.10.1_DEV


+ try to use (eng) data file from tessdata_best[1] (also just tessdata[2]
produce a result)

Regarding image:

   1. I took output from your code "cv2.imwrite('pH.jpg', ph)" (jpg is not
   good format for ocr)
   2. I opened it as grayscale and I see 2 problems covered by
   documentation:
      - it needs to be inverted
      - it needs to be resized to the height of letters is between 30-40
      points.
   3. I guess sharpening (to increase space between dot and 3) would help
   to recognize dot.
   4. Binarize/threshold image by yourself. Tesseract has some binarize
   algorithms, but you can another one that better fit your case.

I suggest doing image preprocessing in the image editor (to check what
helps) and then implementing it into code.

[1] https://github.com/tesseract-ocr/tessdata_best
[2] https://github.com/tesseract-ocr/tessdata

Zdenko


ne 26. 6. 2022 o 0:23 Hervé <herve.hey...@gmail.com> napísal(a):

> Sorry I am really noob
>
> When I do : tesseract pH_treshr.png -
> I have :
> Empty page!!
> Empty page!!
>
> How do you achieve to have this image ? and why can't I tesseract it like
> you ? I am on buster with tesseract 5.1
>
> is there a way to discuss ? discord ?
>
> thanks for your patience and help
>
> Le samedi 25 juin 2022 à 14:34:06 UTC+2, zdenop a écrit :
>
>> Sorry - I mean Rescaling:
>>
>> Tesseract works best on images which have a DPI of at least 300 dpi, so
>> it may be beneficial to resize images. For more information see the FAQ.
>> "Willus Dotkom" made interesting test for Optimal image resolution with
>> suggestion for optimal Height of capital letter in pixels:
>> https://groups.google.com/g/tesseract-ocr/c/Wdh_JJwnw94/m/24JHDYQbBQAJ
>>
>>
>> After that, you can get output (but the dot is missing) with the command
>> line: "tesseract pH_treshr.png -"
>>
>> I was able to get the decimal point separator with the letsgodigital data
>> file
>> https://github.com/arturaugusto/display_ocr/blob/master/letsgodigital/letsgodigital.traineddata
>> tesseract pH_treshr.png - -l letsgodigital
>>
>> Or  have a look at SSD https://github.com/Shreeshrii/tessdata_ssd
>>
>> Zdenko
>>
>>
>> so 25. 6. 2022 o 12:17 Hervé <herve....@gmail.com> napísal(a):
>>
>>> I am on tesseract 5
>>>
>>> Inverting images
>>>
>>> While tesseract version 3.05 (and older) handle inverted image (dark
>>> background and light text) without problem, for 4.x version use dark text
>>> on light background.
>>> isn'it the same than :
>>>     (thresh, im_bw) = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY |
>>> cv2.THRESH_OTSU)
>>>     im_bw = cv2.bitwise_not(im_bw)
>>>
>>> for resizing, I take my picture in full HD, do increasing resolution
>>> will allow tesseract to better OCR ?
>>>
>>> thanks
>>>
>>>
>>> Le samedi 25 juin 2022 à 11:25:50 UTC+2, zdenop a écrit :
>>>
>>>> Why you did not try more relevant hits like inverting and resizing?
>>>>
>>>> Zdenko
>>>>
>>>>
>>>> so 25. 6. 2022 o 10:56 Hervé <herve....@gmail.com> napísal(a):
>>>>
>>>>> I tried gray image, black and white, and I use
>>>>>
>>>>>  custom_psm = r'--psm 7'
>>>>>
>>>>> didn't try others parameters
>>>>> Le samedi 25 juin 2022 à 10:32:14 UTC+2, zdenop a écrit :
>>>>>
>>>>>>
>>>>>>
>>>>>> so 25. 6. 2022 o 8:15 Hervé <herve....@gmail.com> napísal(a):
>>>>>>
>>>>>>> Hi
>>>>>>> I just tried some, without real success
>>>>>>>
>>>>>>> Please be specific: what did you try and what was the result?
>>>>>>
>>>>>>
>>>>>>
>>>>>>> could I learn digits from pictures ? maybe this font is not well
>>>>>>> recognized
>>>>>>>
>>>>>>
>>>>>> Any training is useless if the failure is at the image preprocessing
>>>>>> stage.
>>>>>>
>>>>>>
>>>>>>> thanks
>>>>>>>
>>>>>>> Le vendredi 24 juin 2022 à 17:12:44 UTC+2, zdenop a écrit :
>>>>>>>
>>>>>>>> Did try to implement suggestion from documentation?
>>>>>>>> https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md
>>>>>>>>
>>>>>>>>
>>>>>>>> Zdenko
>>>>>>>>
>>>>>>>>
>>>>>>>> pi 24. 6. 2022 o 16:59 Hervé <herve....@gmail.com> napísal(a):
>>>>>>>>
>>>>>>>>> Hi, I need some help to make tesseract-OCR recognize digits :
>>>>>>>>> can't achieve to make this work with
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://img.super-h.fr/images/2022/06/24/9a03414616bc4c6bd6e4bdb78e9d6783.jpg
>>>>>>>>>
>>>>>>>>> here is my code :
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> import cv2
>>>>>>>>> import pytesseract
>>>>>>>>>
>>>>>>>>> pytesseract.pytesseract.tesseract_cmd ="C:\\Program
>>>>>>>>> Files\\Tesseract-OCR\\tesseract.exe"
>>>>>>>>>
>>>>>>>>> def process_image(img):
>>>>>>>>>     #cv2.imshow('Img',img)
>>>>>>>>>     #cv2.waitKey(0)
>>>>>>>>>
>>>>>>>>>     ### passage en niveau de gris
>>>>>>>>>     gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
>>>>>>>>>     #cv2.imshow('Img',gray)
>>>>>>>>>     #v2.waitKey(0)
>>>>>>>>>
>>>>>>>>>     ###analyse de l'image
>>>>>>>>>     valeur = pytesseract.image_to_string(gray)
>>>>>>>>>     print(valeur)
>>>>>>>>>
>>>>>>>>>     ##passage en noir et blanc
>>>>>>>>>     (thresh, im_bw) = cv2.threshold(gray, 128, 255,
>>>>>>>>> cv2.THRESH_BINARY | cv2.THRESH_OTSU)
>>>>>>>>>     im_bw = cv2.bitwise_not(im_bw)
>>>>>>>>>     #cv2.imshow('Img',im_bw)
>>>>>>>>>     #cv2.waitKey(0)
>>>>>>>>>     # cv2.imwrite('ph.png',im_bw)
>>>>>>>>>     print(pytesseract.image_to_string(im_bw))
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ###ouverture de l'image
>>>>>>>>> img = cv2.imread('ocr5.png')
>>>>>>>>> # cv2.imshow('Img',imgcoupee)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ###on rogne
>>>>>>>>> imgcoupee = img[1056:1517,950:1862]
>>>>>>>>> #img = cv2.imwrite('ocrcoupee.png',imgcoupee)
>>>>>>>>> # cv2.imshow('Img',imgcoupee)
>>>>>>>>>
>>>>>>>>> ### decoupage de la partie correspondant au PH
>>>>>>>>> ph= img[516:625, 616:815]
>>>>>>>>>
>>>>>>>>> #cv2.imwrite('pH.jpg', image_pH)
>>>>>>>>>
>>>>>>>>> ### partie chlore
>>>>>>>>> cl = img[516:625, 882:1056]
>>>>>>>>>
>>>>>>>>> ### partie dÃ:copyright:faut flow
>>>>>>>>> #flow= img[1302:1398,1054:1400]
>>>>>>>>>
>>>>>>>>> ### process
>>>>>>>>> #process_image(imgcoupee)
>>>>>>>>> process_image(ph)
>>>>>>>>> process_image(cl)
>>>>>>>>> #process_image(flow)
>>>>>>>>>
>>>>>>>>> digits seems to be clear enough, but it does'nt work, if someone
>>>>>>>>> could help me ?
>>>>>>>>>
>>>>>>>>> thanks !
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>> send an email to tesseract-oc...@googlegroups.com.
>>>>>>>>> To view this discussion on the web visit
>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/a05712a5-e6ed-411f-a072-e389ea7095efn%40googlegroups.com
>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/a05712a5-e6ed-411f-a072-e389ea7095efn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>> .
>>>>>>>>>
>>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "tesseract-ocr" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to tesseract-oc...@googlegroups.com.
>>>>>>>
>>>>>> To view this discussion on the web visit
>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/4ed81a73-0a82-426e-a35e-ba52c5ac71f1n%40googlegroups.com
>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/4ed81a73-0a82-426e-a35e-ba52c5ac71f1n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to tesseract-oc...@googlegroups.com.
>>>>>
>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/eb2f2bdd-843d-4f11-83bb-d96e578ad94en%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/eb2f2bdd-843d-4f11-83bb-d96e578ad94en%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@googlegroups.com.
>>>
>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/68c4cc25-811d-41dd-b93a-b0df17d9b705n%40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/68c4cc25-811d-41dd-b93a-b0df17d9b705n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/f76dbe5d-d75d-4ef8-90c3-d36ae3898194n%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/f76dbe5d-d75d-4ef8-90c3-d36ae3898194n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8w8ycN21D0ebP8k7Tr3pH6xns4Eq_k4REhcVBS-cW0yzg%40mail.gmail.com.

Reply via email to