decimal point is not a problem, I can devide by 100 or 10 and it works :) could you share my the whole code ? thanks
Le lundi 27 juin 2022 à 20:44:42 UTC+2, zdenop a écrit : > not sure what are you doing, but try something like this: > > def autoinvert(binarized_img, tresh=0.5): > """Invert binarized image if amount of black pixels is higher than > tresh. > """ > height, width = binarized_img.shape > non_zero = cv2.countNonZero(binarized_img) > white_rate = non_zero/(height*width) > if white_rate < tresh: > return ~binarized_img > else: > return binarized_img > > filename = 'default.png' > test = cv2.imread(filename, cv2.IMREAD_GRAYSCALE) > binarized = cv2.threshold(test, 0, 255, cv2.THRESH_BINARY + > cv2.THRESH_OTSU)[1] > kernel = np.ones((5,5), np.uint8) > img_erosion = cv2.dilate(autoinvert(binarized), kernel, iterations=1) > ratio = round(40/img_erosion.shape[0], 2) > ocr_image = cv2.resize(img_erosion, (0,0), fx=ratio, fy=ratio) > > output = pytesseract.image_to_string(ocr_image, > config=f'--tessdata-dir "{tessdata}" --psm 6') > print(output) > > Which produces '733 124', so there is still a problem with the decimal > point... > > Zdenko > > > po 27. 6. 2022 o 13:00 Hervé <herve....@gmail.com> napísal(a): > >> Hi >> >> I don't achieve to have a 300dpi image, I tried with increasing picam >> resolution, I only have 96. I tried with >> >> img = cv2.resize(img, None, fx=1.5, fy=1.5, interpolation=cv2.INTER_AREA) >> >> but it only grows the image size, not the DPI. >> >> Thanks >> >> >> Le dimanche 26 juin 2022 à 15:24:01 UTC+2, zdenop a écrit : >> >>> Check your tesseract version (tesseract -v). Here is mine: >>> >>> tesseract 5.1.0-70-g0df5 >>> leptonica-1.83.0 (Jun 24 2022, 17:48:50) [MSC v.1929 LIB Release x64] >>> libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.0.91) : libpng 1.6.37 : >>> libtiff 4.4.0 : zlib 1.2.12 : libwebp 1.2.2 : libopenjp2 2.5.0 >>> Found AVX2 >>> Found AVX >>> Found FMA >>> Found SSE4.1 >>> Found libarchive 3.5.1 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 >>> libzstd/1.4.9 >>> Found libcurl/7.75.0 zlib/1.2.12 libssh2/1.10.1_DEV >>> >>> >>> + try to use (eng) data file from tessdata_best[1] (also just >>> tessdata[2] produce a result) >>> >>> Regarding image: >>> >>> 1. I took output from your code "cv2.imwrite('pH.jpg', ph)" (jpg is >>> not good format for ocr) >>> 2. I opened it as grayscale and I see 2 problems covered by >>> documentation: >>> - it needs to be inverted >>> - it needs to be resized to the height of letters is between >>> 30-40 points. >>> 3. I guess sharpening (to increase space between dot and 3) >>> would help to recognize dot. >>> 4. Binarize/threshold image by yourself. Tesseract has some binarize >>> algorithms, but you can another one that better fit your case. >>> >>> I suggest doing image preprocessing in the image editor (to check what >>> helps) and then implementing it into code. >>> >>> [1] https://github.com/tesseract-ocr/tessdata_best >>> [2] https://github.com/tesseract-ocr/tessdata >>> >>> Zdenko >>> >>> >>> ne 26. 6. 2022 o 0:23 Hervé <herve....@gmail.com> napísal(a): >>> >>>> Sorry I am really noob >>>> >>>> When I do : tesseract pH_treshr.png - >>>> I have : >>>> Empty page!! >>>> Empty page!! >>>> >>>> How do you achieve to have this image ? and why can't I tesseract it >>>> like you ? I am on buster with tesseract 5.1 >>>> >>>> is there a way to discuss ? discord ? >>>> >>>> thanks for your patience and help >>>> >>>> Le samedi 25 juin 2022 à 14:34:06 UTC+2, zdenop a écrit : >>>> >>>>> Sorry - I mean Rescaling: >>>>> >>>>> Tesseract works best on images which have a DPI of at least 300 dpi, >>>>> so it may be beneficial to resize images. For more information see the >>>>> FAQ. >>>>> "Willus Dotkom" made interesting test for Optimal image resolution >>>>> with suggestion for optimal Height of capital letter in pixels: >>>>> https://groups.google.com/g/tesseract-ocr/c/Wdh_JJwnw94/m/24JHDYQbBQAJ >>>>> >>>>> >>>>> After that, you can get output (but the dot is missing) with the >>>>> command line: "tesseract pH_treshr.png -" >>>>> >>>>> I was able to get the decimal point separator with the letsgodigital >>>>> data file >>>>> https://github.com/arturaugusto/display_ocr/blob/master/letsgodigital/letsgodigital.traineddata >>>>> tesseract pH_treshr.png - -l letsgodigital >>>>> >>>>> Or have a look at SSD https://github.com/Shreeshrii/tessdata_ssd >>>>> >>>>> Zdenko >>>>> >>>>> >>>>> so 25. 6. 2022 o 12:17 Hervé <herve....@gmail.com> napísal(a): >>>>> >>>>>> I am on tesseract 5 >>>>>> >>>>>> Inverting images >>>>>> >>>>>> While tesseract version 3.05 (and older) handle inverted image (dark >>>>>> background and light text) without problem, for 4.x version use dark >>>>>> text >>>>>> on light background. >>>>>> isn'it the same than : >>>>>> (thresh, im_bw) = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY >>>>>> | cv2.THRESH_OTSU) >>>>>> im_bw = cv2.bitwise_not(im_bw) >>>>>> >>>>>> for resizing, I take my picture in full HD, do increasing resolution >>>>>> will allow tesseract to better OCR ? >>>>>> >>>>>> thanks >>>>>> >>>>>> >>>>>> Le samedi 25 juin 2022 à 11:25:50 UTC+2, zdenop a écrit : >>>>>> >>>>>>> Why you did not try more relevant hits like inverting and resizing? >>>>>>> >>>>>>> Zdenko >>>>>>> >>>>>>> >>>>>>> so 25. 6. 2022 o 10:56 Hervé <herve....@gmail.com> napísal(a): >>>>>>> >>>>>>>> I tried gray image, black and white, and I use >>>>>>>> >>>>>>>> custom_psm = r'--psm 7' >>>>>>>> >>>>>>>> didn't try others parameters >>>>>>>> Le samedi 25 juin 2022 à 10:32:14 UTC+2, zdenop a écrit : >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> so 25. 6. 2022 o 8:15 Hervé <herve....@gmail.com> napísal(a): >>>>>>>>> >>>>>>>>>> Hi >>>>>>>>>> I just tried some, without real success >>>>>>>>>> >>>>>>>>>> Please be specific: what did you try and what was the result? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> could I learn digits from pictures ? maybe this font is not well >>>>>>>>>> recognized >>>>>>>>>> >>>>>>>>> >>>>>>>>> Any training is useless if the failure is at the image >>>>>>>>> preprocessing stage. >>>>>>>>> >>>>>>>>> >>>>>>>>>> thanks >>>>>>>>>> >>>>>>>>>> Le vendredi 24 juin 2022 à 17:12:44 UTC+2, zdenop a écrit : >>>>>>>>>> >>>>>>>>>>> Did try to implement suggestion from documentation? >>>>>>>>>>> >>>>>>>>>>> https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Zdenko >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> pi 24. 6. 2022 o 16:59 Hervé <herve....@gmail.com> napísal(a): >>>>>>>>>>> >>>>>>>>>>>> Hi, I need some help to make tesseract-OCR recognize digits : >>>>>>>>>>>> can't achieve to make this work with >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> https://img.super-h.fr/images/2022/06/24/9a03414616bc4c6bd6e4bdb78e9d6783.jpg >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> here is my code : >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> import cv2 >>>>>>>>>>>> import pytesseract >>>>>>>>>>>> >>>>>>>>>>>> pytesseract.pytesseract.tesseract_cmd ="C:\\Program >>>>>>>>>>>> Files\\Tesseract-OCR\\tesseract.exe" >>>>>>>>>>>> >>>>>>>>>>>> def process_image(img): >>>>>>>>>>>> #cv2.imshow('Img',img) >>>>>>>>>>>> #cv2.waitKey(0) >>>>>>>>>>>> >>>>>>>>>>>> ### passage en niveau de gris >>>>>>>>>>>> gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) >>>>>>>>>>>> #cv2.imshow('Img',gray) >>>>>>>>>>>> #v2.waitKey(0) >>>>>>>>>>>> >>>>>>>>>>>> ###analyse de l'image >>>>>>>>>>>> valeur = pytesseract.image_to_string(gray) >>>>>>>>>>>> print(valeur) >>>>>>>>>>>> >>>>>>>>>>>> ##passage en noir et blanc >>>>>>>>>>>> (thresh, im_bw) = cv2.threshold(gray, 128, 255, >>>>>>>>>>>> cv2.THRESH_BINARY | cv2.THRESH_OTSU) >>>>>>>>>>>> im_bw = cv2.bitwise_not(im_bw) >>>>>>>>>>>> #cv2.imshow('Img',im_bw) >>>>>>>>>>>> #cv2.waitKey(0) >>>>>>>>>>>> # cv2.imwrite('ph.png',im_bw) >>>>>>>>>>>> print(pytesseract.image_to_string(im_bw)) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ###ouverture de l'image >>>>>>>>>>>> img = cv2.imread('ocr5.png') >>>>>>>>>>>> # cv2.imshow('Img',imgcoupee) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ###on rogne >>>>>>>>>>>> imgcoupee = img[1056:1517,950:1862] >>>>>>>>>>>> #img = cv2.imwrite('ocrcoupee.png',imgcoupee) >>>>>>>>>>>> # cv2.imshow('Img',imgcoupee) >>>>>>>>>>>> >>>>>>>>>>>> ### decoupage de la partie correspondant au PH >>>>>>>>>>>> ph= img[516:625, 616:815] >>>>>>>>>>>> >>>>>>>>>>>> #cv2.imwrite('pH.jpg', image_pH) >>>>>>>>>>>> >>>>>>>>>>>> ### partie chlore >>>>>>>>>>>> cl = img[516:625, 882:1056] >>>>>>>>>>>> >>>>>>>>>>>> ### partie dÃ:copyright:faut flow >>>>>>>>>>>> #flow= img[1302:1398,1054:1400] >>>>>>>>>>>> >>>>>>>>>>>> ### process >>>>>>>>>>>> #process_image(imgcoupee) >>>>>>>>>>>> process_image(ph) >>>>>>>>>>>> process_image(cl) >>>>>>>>>>>> #process_image(flow) >>>>>>>>>>>> >>>>>>>>>>>> digits seems to be clear enough, but it does'nt work, if >>>>>>>>>>>> someone could help me ? >>>>>>>>>>>> >>>>>>>>>>>> thanks ! >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>> Google Groups "tesseract-ocr" group. >>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>>> it, send an email to tesseract-oc...@googlegroups.com. >>>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/a05712a5-e6ed-411f-a072-e389ea7095efn%40googlegroups.com >>>>>>>>>>>> >>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/a05712a5-e6ed-411f-a072-e389ea7095efn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>>> . >>>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>> Google Groups "tesseract-ocr" group. >>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>>> send an email to tesseract-oc...@googlegroups.com. >>>>>>>>>> >>>>>>>>> To view this discussion on the web visit >>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/4ed81a73-0a82-426e-a35e-ba52c5ac71f1n%40googlegroups.com >>>>>>>>>> >>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/4ed81a73-0a82-426e-a35e-ba52c5ac71f1n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>> . >>>>>>>>>> >>>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "tesseract-ocr" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to tesseract-oc...@googlegroups.com. >>>>>>>> >>>>>>> To view this discussion on the web visit >>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/eb2f2bdd-843d-4f11-83bb-d96e578ad94en%40googlegroups.com >>>>>>>> >>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/eb2f2bdd-843d-4f11-83bb-d96e578ad94en%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>> . >>>>>>>> >>>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "tesseract-ocr" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to tesseract-oc...@googlegroups.com. >>>>>> >>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/tesseract-ocr/68c4cc25-811d-41dd-b93a-b0df17d9b705n%40googlegroups.com >>>>>> >>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/68c4cc25-811d-41dd-b93a-b0df17d9b705n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> >>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to tesseract-oc...@googlegroups.com. >>>> >>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/f76dbe5d-d75d-4ef8-90c3-d36ae3898194n%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/f76dbe5d-d75d-4ef8-90c3-d36ae3898194n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/93e684ff-c519-4966-906b-ed6b376ee11en%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/93e684ff-c519-4966-906b-ed6b376ee11en%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/4f74ab5a-4305-4d57-9154-e0bdda7dfb1an%40googlegroups.com.