Thank you very much for your help.
I will give it a try.

Best regards
Hai


On Sat, Mar 11, 2023, 8:14 PM Zdenko Podobny <zde...@gmail.com> wrote:

> the latest code (5.3.0) (on windows)
>
> Zdenko
>
>
> so 11. 3. 2023 o 2:16 nguyen ngoc hai <nguyenngochaib...@gmail.com>
> napísal(a):
>
>> Dear Zdenko,
>>
>> Thank you very much for your suggestion.
>>
>> May I ask which version of tesseract are you using?
>> I ran the same command with tesseract v5.0.0, but I got a different
>> result.
>>
>> ```
>> >tesseract -v
>> tesseract v5.0.0-alpha.20210811
>> ...
>> Warning, detects only orientation with -l jpn
>> Page number: 0
>> Orientation in degrees: 270
>> Rotate: 90
>> Orientation confidence: 46.00
>> Script: Latin
>> Script confidence: 2.00
>> ```
>> Should I upgrade to the newest version of tesseract or try some extra
>> preprocessing methods before detecting text orientation?
>> Thank you for your time.
>> Best regards
>> Hai
>>
>>
>>
>> On Sat, Mar 11, 2023 at 5:34 AM Zdenko Podobny <zde...@gmail.com> wrote:
>>
>>> script detection was always problematic and tesseract try to
>>> identify only a few...
>>>
>>> Regarding rotation you can get better results by using the language file:
>>> >tesseract unnamed.jpg - --psm 0 -l jpn
>>> Warning, detects only orientation with -l jpn
>>> Estimating resolution as 262
>>> Warning. Invalid resolution 0 dpi. Using 70 instead.
>>> Page number: 0
>>> Orientation in degrees: 90
>>> Rotate: 270
>>> Orientation confidence: 6.44
>>> Script: Han
>>> Script confidence: 1.43
>>>
>>> Zdenko
>>>
>>>
>>> pi 10. 3. 2023 o 18:21 nguyen ngoc hai <nguyenngochaib...@gmail.com>
>>> napísal(a):
>>>
>>>> I have the following image:
>>>>
>>>>  [image: 17_Receipt Transform No resize.jpg]
>>>>
>>>> I used the following code to get the text orientation, it works for
>>>> most of my samples except the above image.
>>>>
>>>> ```python
>>>>     def get_orientation_confidence(cv2_img_data):
>>>>         image = cv2pil(cv2_img_data)
>>>>         osd_result = {}
>>>>
>>>>         with tesserocr.PyTessBaseAPI(lang='osd') as api:
>>>>             api.SetImage(image)
>>>>             api.SetSourceResolution(300)
>>>>
>>>>             osd_result = api.DetectOrientationScript()
>>>>
>>>>         return osd_result
>>>>
>>>>     # preprocess image before detecting orientation
>>>>     gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
>>>>     gray_white_border = self.make_border_white(gray)
>>>>     self.show_image("gray_white_border", gray_white_border)
>>>>
>>>>     # Threshold the image to convert it to black and white
>>>>     threshold = cv2.threshold(gray_white_border, 0, 255,
>>>> cv2.THRESH_OTSU)[1]
>>>>     self.show_image("threshold otsu", threshold)
>>>>
>>>>     osd_ret = get_orientation_confidence(pre_roi_im)
>>>>     print(osd_ret['orient_deg'])
>>>> ```
>>>> ```cmd
>>>> {'orient_deg': 180, 'orient_conf': 0.06795501708984375, 'script_name':
>>>> 'Arabic', 'script_conf': 0.0}
>>>> ```
>>>> Here, the results I got were not correct, and also wrong language
>>>> detection.
>>>>
>>>> I hope to get {'orient_deg': 90, 'script_name': 'Japanese', ...}
>>>> I supposed the results belonged to tesseract's output results.
>>>>
>>>> Is that possible to get the correct orientation degree here?
>>>> Assuming that I already know the language, are there any methods (such
>>>> as applying extra image preprocessing, etc.) that can provide better
>>>> accuracy here?
>>>>
>>>> Thank you very much for your time.
>>>> I hope to hear any suggestions.
>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to tesseract-ocr+unsubscr...@googlegroups.com.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/tesseract-ocr/e447f23e-a0e1-4a91-b6e1-0eca8511f7acn%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/e447f23e-a0e1-4a91-b6e1-0eca8511f7acn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "tesseract-ocr" group.
>>> To unsubscribe from this topic, visit
>>> https://groups.google.com/d/topic/tesseract-ocr/CPTtW5bPqYc/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> tesseract-ocr+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xoY%2BTVbQLuSXXN3u-5LEAPpZ4nq7CJHdFRXLQJta2yBQ%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xoY%2BTVbQLuSXXN3u-5LEAPpZ4nq7CJHdFRXLQJta2yBQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>
>>
>> --
>> *Nguyen Ngoc Hai*
>>
>> *Phone:  +81 1488 4168  (JP).*
>> *skype ID: nguyenngochaibkhn.*
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/CA%2BWjAfqTWpZ5rbkAUFVY2-cKhKBFq3CY33bAaCyVLtv3tsGWXw%40mail.gmail.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/CA%2BWjAfqTWpZ5rbkAUFVY2-cKhKBFq3CY33bAaCyVLtv3tsGWXw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "tesseract-ocr" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/tesseract-ocr/CPTtW5bPqYc/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wZDotnyN8NpGpbDPPrpWG7vDJj_sX6XrOZAUsfa888qw%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wZDotnyN8NpGpbDPPrpWG7vDJj_sX6XrOZAUsfa888qw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CA%2BWjAfoP4JY4%2BLEfAKvA2qrua86jh5jf6KWJoaMoBiL2hvp_Jg%40mail.gmail.com.

Reply via email to