Check your tesseract version (tesseract -v). Here is mine: tesseract 5.1.0-70-g0df5 leptonica-1.83.0 (Jun 24 2022, 17:48:50) [MSC v.1929 LIB Release x64] libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.0.91) : libpng 1.6.37 : libtiff 4.4.0 : zlib 1.2.12 : libwebp 1.2.2 : libopenjp2 2.5.0 Found AVX2 Found AVX Found FMA Found SSE4.1 Found libarchive 3.5.1 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 libzstd/1.4.9 Found libcurl/7.75.0 zlib/1.2.12 libssh2/1.10.1_DEV
+ try to use (eng) data file from tessdata_best[1] (also just tessdata[2] produce a result) Regarding image: 1. I took output from your code "cv2.imwrite('pH.jpg', ph)" (jpg is not good format for ocr) 2. I opened it as grayscale and I see 2 problems covered by documentation: - it needs to be inverted - it needs to be resized to the height of letters is between 30-40 points. 3. I guess sharpening (to increase space between dot and 3) would help to recognize dot. 4. Binarize/threshold image by yourself. Tesseract has some binarize algorithms, but you can another one that better fit your case. I suggest doing image preprocessing in the image editor (to check what helps) and then implementing it into code. [1] https://github.com/tesseract-ocr/tessdata_best [2] https://github.com/tesseract-ocr/tessdata Zdenko ne 26. 6. 2022 o 0:23 Hervé <herve.hey...@gmail.com> napísal(a): > Sorry I am really noob > > When I do : tesseract pH_treshr.png - > I have : > Empty page!! > Empty page!! > > How do you achieve to have this image ? and why can't I tesseract it like > you ? I am on buster with tesseract 5.1 > > is there a way to discuss ? discord ? > > thanks for your patience and help > > Le samedi 25 juin 2022 à 14:34:06 UTC+2, zdenop a écrit : > >> Sorry - I mean Rescaling: >> >> Tesseract works best on images which have a DPI of at least 300 dpi, so >> it may be beneficial to resize images. For more information see the FAQ. >> "Willus Dotkom" made interesting test for Optimal image resolution with >> suggestion for optimal Height of capital letter in pixels: >> https://groups.google.com/g/tesseract-ocr/c/Wdh_JJwnw94/m/24JHDYQbBQAJ >> >> >> After that, you can get output (but the dot is missing) with the command >> line: "tesseract pH_treshr.png -" >> >> I was able to get the decimal point separator with the letsgodigital data >> file >> https://github.com/arturaugusto/display_ocr/blob/master/letsgodigital/letsgodigital.traineddata >> tesseract pH_treshr.png - -l letsgodigital >> >> Or have a look at SSD https://github.com/Shreeshrii/tessdata_ssd >> >> Zdenko >> >> >> so 25. 6. 2022 o 12:17 Hervé <herve....@gmail.com> napísal(a): >> >>> I am on tesseract 5 >>> >>> Inverting images >>> >>> While tesseract version 3.05 (and older) handle inverted image (dark >>> background and light text) without problem, for 4.x version use dark text >>> on light background. >>> isn'it the same than : >>> (thresh, im_bw) = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY | >>> cv2.THRESH_OTSU) >>> im_bw = cv2.bitwise_not(im_bw) >>> >>> for resizing, I take my picture in full HD, do increasing resolution >>> will allow tesseract to better OCR ? >>> >>> thanks >>> >>> >>> Le samedi 25 juin 2022 à 11:25:50 UTC+2, zdenop a écrit : >>> >>>> Why you did not try more relevant hits like inverting and resizing? >>>> >>>> Zdenko >>>> >>>> >>>> so 25. 6. 2022 o 10:56 Hervé <herve....@gmail.com> napísal(a): >>>> >>>>> I tried gray image, black and white, and I use >>>>> >>>>> custom_psm = r'--psm 7' >>>>> >>>>> didn't try others parameters >>>>> Le samedi 25 juin 2022 à 10:32:14 UTC+2, zdenop a écrit : >>>>> >>>>>> >>>>>> >>>>>> so 25. 6. 2022 o 8:15 Hervé <herve....@gmail.com> napísal(a): >>>>>> >>>>>>> Hi >>>>>>> I just tried some, without real success >>>>>>> >>>>>>> Please be specific: what did you try and what was the result? >>>>>> >>>>>> >>>>>> >>>>>>> could I learn digits from pictures ? maybe this font is not well >>>>>>> recognized >>>>>>> >>>>>> >>>>>> Any training is useless if the failure is at the image preprocessing >>>>>> stage. >>>>>> >>>>>> >>>>>>> thanks >>>>>>> >>>>>>> Le vendredi 24 juin 2022 à 17:12:44 UTC+2, zdenop a écrit : >>>>>>> >>>>>>>> Did try to implement suggestion from documentation? >>>>>>>> https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md >>>>>>>> >>>>>>>> >>>>>>>> Zdenko >>>>>>>> >>>>>>>> >>>>>>>> pi 24. 6. 2022 o 16:59 Hervé <herve....@gmail.com> napísal(a): >>>>>>>> >>>>>>>>> Hi, I need some help to make tesseract-OCR recognize digits : >>>>>>>>> can't achieve to make this work with >>>>>>>>> >>>>>>>>> >>>>>>>>> https://img.super-h.fr/images/2022/06/24/9a03414616bc4c6bd6e4bdb78e9d6783.jpg >>>>>>>>> >>>>>>>>> here is my code : >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> import cv2 >>>>>>>>> import pytesseract >>>>>>>>> >>>>>>>>> pytesseract.pytesseract.tesseract_cmd ="C:\\Program >>>>>>>>> Files\\Tesseract-OCR\\tesseract.exe" >>>>>>>>> >>>>>>>>> def process_image(img): >>>>>>>>> #cv2.imshow('Img',img) >>>>>>>>> #cv2.waitKey(0) >>>>>>>>> >>>>>>>>> ### passage en niveau de gris >>>>>>>>> gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) >>>>>>>>> #cv2.imshow('Img',gray) >>>>>>>>> #v2.waitKey(0) >>>>>>>>> >>>>>>>>> ###analyse de l'image >>>>>>>>> valeur = pytesseract.image_to_string(gray) >>>>>>>>> print(valeur) >>>>>>>>> >>>>>>>>> ##passage en noir et blanc >>>>>>>>> (thresh, im_bw) = cv2.threshold(gray, 128, 255, >>>>>>>>> cv2.THRESH_BINARY | cv2.THRESH_OTSU) >>>>>>>>> im_bw = cv2.bitwise_not(im_bw) >>>>>>>>> #cv2.imshow('Img',im_bw) >>>>>>>>> #cv2.waitKey(0) >>>>>>>>> # cv2.imwrite('ph.png',im_bw) >>>>>>>>> print(pytesseract.image_to_string(im_bw)) >>>>>>>>> >>>>>>>>> >>>>>>>>> ###ouverture de l'image >>>>>>>>> img = cv2.imread('ocr5.png') >>>>>>>>> # cv2.imshow('Img',imgcoupee) >>>>>>>>> >>>>>>>>> >>>>>>>>> ###on rogne >>>>>>>>> imgcoupee = img[1056:1517,950:1862] >>>>>>>>> #img = cv2.imwrite('ocrcoupee.png',imgcoupee) >>>>>>>>> # cv2.imshow('Img',imgcoupee) >>>>>>>>> >>>>>>>>> ### decoupage de la partie correspondant au PH >>>>>>>>> ph= img[516:625, 616:815] >>>>>>>>> >>>>>>>>> #cv2.imwrite('pH.jpg', image_pH) >>>>>>>>> >>>>>>>>> ### partie chlore >>>>>>>>> cl = img[516:625, 882:1056] >>>>>>>>> >>>>>>>>> ### partie dÃ:copyright:faut flow >>>>>>>>> #flow= img[1302:1398,1054:1400] >>>>>>>>> >>>>>>>>> ### process >>>>>>>>> #process_image(imgcoupee) >>>>>>>>> process_image(ph) >>>>>>>>> process_image(cl) >>>>>>>>> #process_image(flow) >>>>>>>>> >>>>>>>>> digits seems to be clear enough, but it does'nt work, if someone >>>>>>>>> could help me ? >>>>>>>>> >>>>>>>>> thanks ! >>>>>>>>> >>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "tesseract-ocr" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to tesseract-oc...@googlegroups.com. >>>>>>>>> To view this discussion on the web visit >>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/a05712a5-e6ed-411f-a072-e389ea7095efn%40googlegroups.com >>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/a05712a5-e6ed-411f-a072-e389ea7095efn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>> . >>>>>>>>> >>>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "tesseract-ocr" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to tesseract-oc...@googlegroups.com. >>>>>>> >>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/4ed81a73-0a82-426e-a35e-ba52c5ac71f1n%40googlegroups.com >>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/4ed81a73-0a82-426e-a35e-ba52c5ac71f1n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to tesseract-oc...@googlegroups.com. >>>>> >>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/eb2f2bdd-843d-4f11-83bb-d96e578ad94en%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/eb2f2bdd-843d-4f11-83bb-d96e578ad94en%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com. >>> >> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/68c4cc25-811d-41dd-b93a-b0df17d9b705n%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/68c4cc25-811d-41dd-b93a-b0df17d9b705n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/f76dbe5d-d75d-4ef8-90c3-d36ae3898194n%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/f76dbe5d-d75d-4ef8-90c3-d36ae3898194n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8w8ycN21D0ebP8k7Tr3pH6xns4Eq_k4REhcVBS-cW0yzg%40mail.gmail.com.