Dear Zdenko and everyone, 

Thank you for your help last time. 

Apologize for getting back a bit late, I could get the same results by 
using the same language which you suggested. 
However, the language model gave me less accurate OCR results than the 
language model in *tessdata_best*.
It is troublesome, but would it be possible for tesseract to specify a 
different model (from the same language)?
For example:
Use the Legacy model for OSD, and use the tessdata_best model for 
extracting text.

Please also forgive me that due to the data privacy matter, I will have to 
delete the uploaded image from the post later. 

Thank you for your time. 
Best regards
Hai


On Sunday, March 12, 2023 at 2:55:52 AM UTC+9 zdenop wrote:

> one more thing: I used a language file from 
> https://github.com/tesseract-ocr/tessdata e.g. with legacy engine data.
>
> Zdenko
>
>
> so 11. 3. 2023 o 13:18 nguyen ngoc hai <nguyenng...@gmail.com> napísal(a):
>
>> Thank you very much for your help.
>> I will give it a try. 
>>
>> Best regards
>> Hai 
>>
>>
>> On Sat, Mar 11, 2023, 8:14 PM Zdenko Podobny <zde...@gmail.com> wrote:
>>
>>> the latest code (5.3.0) (on windows)
>>>
>>> Zdenko
>>>
>>>
>>> so 11. 3. 2023 o 2:16 nguyen ngoc hai <nguyenng...@gmail.com> 
>>> napísal(a):
>>>
>>>> Dear Zdenko,
>>>>
>>>> Thank you very much for your suggestion.
>>>>
>>>> May I ask which version of tesseract are you using?
>>>> I ran the same command with tesseract v5.0.0, but I got a different 
>>>> result. 
>>>>
>>>> ```
>>>> >tesseract -v
>>>> tesseract v5.0.0-alpha.20210811
>>>> ...
>>>> Warning, detects only orientation with -l jpn
>>>> Page number: 0
>>>> Orientation in degrees: 270
>>>> Rotate: 90
>>>> Orientation confidence: 46.00
>>>> Script: Latin
>>>> Script confidence: 2.00
>>>> ```
>>>> Should I upgrade to the newest version of tesseract or try some extra 
>>>> preprocessing methods before detecting text orientation?
>>>> Thank you for your time. 
>>>> Best regards
>>>> Hai
>>>>
>>>>
>>>>
>>>> On Sat, Mar 11, 2023 at 5:34 AM Zdenko Podobny <zde...@gmail.com> 
>>>> wrote:
>>>>
>>>>> script detection was always problematic and tesseract try to 
>>>>> identify only a few...
>>>>>
>>>>> Regarding rotation you can get better results by using the language 
>>>>> file:
>>>>> >tesseract unnamed.jpg - --psm 0 -l jpn
>>>>> Warning, detects only orientation with -l jpn
>>>>> Estimating resolution as 262
>>>>> Warning. Invalid resolution 0 dpi. Using 70 instead.
>>>>> Page number: 0
>>>>> Orientation in degrees: 90
>>>>> Rotate: 270
>>>>> Orientation confidence: 6.44
>>>>> Script: Han
>>>>> Script confidence: 1.43
>>>>>
>>>>> Zdenko
>>>>>
>>>>>
>>>>> pi 10. 3. 2023 o 18:21 nguyen ngoc hai <nguyenng...@gmail.com> 
>>>>> napísal(a):
>>>>>
>>>>>> I have the following image:
>>>>>>
>>>>>>  [image: 17_Receipt Transform No resize.jpg]
>>>>>>
>>>>>> I used the following code to get the text orientation, it works for 
>>>>>> most of my samples except the above image. 
>>>>>>
>>>>>> ```python
>>>>>>     def get_orientation_confidence(cv2_img_data):
>>>>>>         image = cv2pil(cv2_img_data)
>>>>>>         osd_result = {}
>>>>>>
>>>>>>         with tesserocr.PyTessBaseAPI(lang='osd') as api:
>>>>>>             api.SetImage(image)
>>>>>>             api.SetSourceResolution(300)
>>>>>>
>>>>>>             osd_result = api.DetectOrientationScript()
>>>>>>
>>>>>>         return osd_result
>>>>>>
>>>>>>     # preprocess image before detecting orientation
>>>>>>     gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
>>>>>>     gray_white_border = self.make_border_white(gray)
>>>>>>     self.show_image("gray_white_border", gray_white_border)
>>>>>>
>>>>>>     # Threshold the image to convert it to black and white
>>>>>>     threshold = cv2.threshold(gray_white_border, 0, 255, 
>>>>>> cv2.THRESH_OTSU)[1]
>>>>>>     self.show_image("threshold otsu", threshold)
>>>>>>
>>>>>>     osd_ret = get_orientation_confidence(pre_roi_im)
>>>>>>     print(osd_ret['orient_deg'])
>>>>>> ```
>>>>>> ```cmd
>>>>>> {'orient_deg': 180, 'orient_conf': 0.06795501708984375, 
>>>>>> 'script_name': 'Arabic', 'script_conf': 0.0}
>>>>>> ```
>>>>>> Here, the results I got were not correct, and also wrong language 
>>>>>> detection. 
>>>>>>
>>>>>> I hope to get {'orient_deg': 90, 'script_name': 'Japanese', ...} 
>>>>>> I supposed the results belonged to tesseract's output results. 
>>>>>>
>>>>>> Is that possible to get the correct orientation degree here? 
>>>>>> Assuming that I already know the language, are there any methods 
>>>>>> (such as applying extra image preprocessing, etc.) that can provide 
>>>>>> better 
>>>>>> accuracy here?
>>>>>>
>>>>>> Thank you very much for your time. 
>>>>>> I hope to hear any suggestions. 
>>>>>>
>>>>>>
>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to tesseract-oc...@googlegroups.com.
>>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/e447f23e-a0e1-4a91-b6e1-0eca8511f7acn%40googlegroups.com
>>>>>>  
>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/e447f23e-a0e1-4a91-b6e1-0eca8511f7acn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>> -- 
>>>>> You received this message because you are subscribed to a topic in the 
>>>>> Google Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this topic, visit 
>>>>> https://groups.google.com/d/topic/tesseract-ocr/CPTtW5bPqYc/unsubscribe
>>>>> .
>>>>> To unsubscribe from this group and all its topics, send an email to 
>>>>> tesseract-oc...@googlegroups.com.
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xoY%2BTVbQLuSXXN3u-5LEAPpZ4nq7CJHdFRXLQJta2yBQ%40mail.gmail.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xoY%2BTVbQLuSXXN3u-5LEAPpZ4nq7CJHdFRXLQJta2yBQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>
>>>>
>>>> -- 
>>>> *Nguyen Ngoc Hai*
>>>>
>>>> *Phone:  +81 1488 4168  (JP).*
>>>> *skype ID: nguyenngochaibkhn.*
>>>>
>>>>
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to tesseract-oc...@googlegroups.com.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/CA%2BWjAfqTWpZ5rbkAUFVY2-cKhKBFq3CY33bAaCyVLtv3tsGWXw%40mail.gmail.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CA%2BWjAfqTWpZ5rbkAUFVY2-cKhKBFq3CY33bAaCyVLtv3tsGWXw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> -- 
>>> You received this message because you are subscribed to a topic in the 
>>> Google Groups "tesseract-ocr" group.
>>> To unsubscribe from this topic, visit 
>>> https://groups.google.com/d/topic/tesseract-ocr/CPTtW5bPqYc/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to 
>>> tesseract-oc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wZDotnyN8NpGpbDPPrpWG7vDJj_sX6XrOZAUsfa888qw%40mail.gmail.com
>>>  
>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wZDotnyN8NpGpbDPPrpWG7vDJj_sX6XrOZAUsfa888qw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com.
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/CA%2BWjAfoP4JY4%2BLEfAKvA2qrua86jh5jf6KWJoaMoBiL2hvp_Jg%40mail.gmail.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/CA%2BWjAfoP4JY4%2BLEfAKvA2qrua86jh5jf6KWJoaMoBiL2hvp_Jg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/6413a4ae-7255-4533-9654-f28cc54caa61n%40googlegroups.com.

Reply via email to