Thank you very much for your help. I will give it a try. Best regards Hai
On Sat, Mar 11, 2023, 8:14 PM Zdenko Podobny <zde...@gmail.com> wrote: > the latest code (5.3.0) (on windows) > > Zdenko > > > so 11. 3. 2023 o 2:16 nguyen ngoc hai <nguyenngochaib...@gmail.com> > napísal(a): > >> Dear Zdenko, >> >> Thank you very much for your suggestion. >> >> May I ask which version of tesseract are you using? >> I ran the same command with tesseract v5.0.0, but I got a different >> result. >> >> ``` >> >tesseract -v >> tesseract v5.0.0-alpha.20210811 >> ... >> Warning, detects only orientation with -l jpn >> Page number: 0 >> Orientation in degrees: 270 >> Rotate: 90 >> Orientation confidence: 46.00 >> Script: Latin >> Script confidence: 2.00 >> ``` >> Should I upgrade to the newest version of tesseract or try some extra >> preprocessing methods before detecting text orientation? >> Thank you for your time. >> Best regards >> Hai >> >> >> >> On Sat, Mar 11, 2023 at 5:34 AM Zdenko Podobny <zde...@gmail.com> wrote: >> >>> script detection was always problematic and tesseract try to >>> identify only a few... >>> >>> Regarding rotation you can get better results by using the language file: >>> >tesseract unnamed.jpg - --psm 0 -l jpn >>> Warning, detects only orientation with -l jpn >>> Estimating resolution as 262 >>> Warning. Invalid resolution 0 dpi. Using 70 instead. >>> Page number: 0 >>> Orientation in degrees: 90 >>> Rotate: 270 >>> Orientation confidence: 6.44 >>> Script: Han >>> Script confidence: 1.43 >>> >>> Zdenko >>> >>> >>> pi 10. 3. 2023 o 18:21 nguyen ngoc hai <nguyenngochaib...@gmail.com> >>> napísal(a): >>> >>>> I have the following image: >>>> >>>> [image: 17_Receipt Transform No resize.jpg] >>>> >>>> I used the following code to get the text orientation, it works for >>>> most of my samples except the above image. >>>> >>>> ```python >>>> def get_orientation_confidence(cv2_img_data): >>>> image = cv2pil(cv2_img_data) >>>> osd_result = {} >>>> >>>> with tesserocr.PyTessBaseAPI(lang='osd') as api: >>>> api.SetImage(image) >>>> api.SetSourceResolution(300) >>>> >>>> osd_result = api.DetectOrientationScript() >>>> >>>> return osd_result >>>> >>>> # preprocess image before detecting orientation >>>> gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) >>>> gray_white_border = self.make_border_white(gray) >>>> self.show_image("gray_white_border", gray_white_border) >>>> >>>> # Threshold the image to convert it to black and white >>>> threshold = cv2.threshold(gray_white_border, 0, 255, >>>> cv2.THRESH_OTSU)[1] >>>> self.show_image("threshold otsu", threshold) >>>> >>>> osd_ret = get_orientation_confidence(pre_roi_im) >>>> print(osd_ret['orient_deg']) >>>> ``` >>>> ```cmd >>>> {'orient_deg': 180, 'orient_conf': 0.06795501708984375, 'script_name': >>>> 'Arabic', 'script_conf': 0.0} >>>> ``` >>>> Here, the results I got were not correct, and also wrong language >>>> detection. >>>> >>>> I hope to get {'orient_deg': 90, 'script_name': 'Japanese', ...} >>>> I supposed the results belonged to tesseract's output results. >>>> >>>> Is that possible to get the correct orientation degree here? >>>> Assuming that I already know the language, are there any methods (such >>>> as applying extra image preprocessing, etc.) that can provide better >>>> accuracy here? >>>> >>>> Thank you very much for your time. >>>> I hope to hear any suggestions. >>>> >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to tesseract-ocr+unsubscr...@googlegroups.com. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/e447f23e-a0e1-4a91-b6e1-0eca8511f7acn%40googlegroups.com >>>> <https://groups.google.com/d/msgid/tesseract-ocr/e447f23e-a0e1-4a91-b6e1-0eca8511f7acn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- >>> You received this message because you are subscribed to a topic in the >>> Google Groups "tesseract-ocr" group. >>> To unsubscribe from this topic, visit >>> https://groups.google.com/d/topic/tesseract-ocr/CPTtW5bPqYc/unsubscribe. >>> To unsubscribe from this group and all its topics, send an email to >>> tesseract-ocr+unsubscr...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xoY%2BTVbQLuSXXN3u-5LEAPpZ4nq7CJHdFRXLQJta2yBQ%40mail.gmail.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xoY%2BTVbQLuSXXN3u-5LEAPpZ4nq7CJHdFRXLQJta2yBQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> >> >> >> -- >> *Nguyen Ngoc Hai* >> >> *Phone: +81 1488 4168 (JP).* >> *skype ID: nguyenngochaibkhn.* >> >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-ocr+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/CA%2BWjAfqTWpZ5rbkAUFVY2-cKhKBFq3CY33bAaCyVLtv3tsGWXw%40mail.gmail.com >> <https://groups.google.com/d/msgid/tesseract-ocr/CA%2BWjAfqTWpZ5rbkAUFVY2-cKhKBFq3CY33bAaCyVLtv3tsGWXw%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> > -- > You received this message because you are subscribed to a topic in the > Google Groups "tesseract-ocr" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/tesseract-ocr/CPTtW5bPqYc/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wZDotnyN8NpGpbDPPrpWG7vDJj_sX6XrOZAUsfa888qw%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wZDotnyN8NpGpbDPPrpWG7vDJj_sX6XrOZAUsfa888qw%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CA%2BWjAfoP4JY4%2BLEfAKvA2qrua86jh5jf6KWJoaMoBiL2hvp_Jg%40mail.gmail.com.