Re: [tesseract-ocr] bad quality!?

Cyrus Yip Wed, 29 Dec 2021 12:18:42 -0800

but won't multiple ocr's and crops use a lot of time?

On Wednesday, December 29, 2021 at 10:15:26 AM UTC-8 zdenop wrote:


> IMO if the text is always in the same area, cropping and OCR just that 
> area will be faster.
>
> Zdenko
>
>
> st 29. 12. 2021 o 18:58 Cyrus Yip <cyrus...@gmail.com> napísal(a):
>
>> I played around a bit and replacing all colours except for text colour 
>> and it works pretty well!
>>
>> The only thing is replacing colours with:
>> im = im.convert("RGB")
>> pixdata = im.load()
>> for y in range(im.height):
>>     for x in range(im.width):
>>         if pixdata[x, y] != (51, 51, 51):
>>             pixdata[x, y] = (255, 255, 255)
>> is a bit slow. Do you know a better way to replace pixels in python? I 
>> don't know if this is off topic.
>> On Wednesday, December 29, 2021 at 9:46:13 AM UTC-8 zdenop wrote:
>>
>>> If you properly crop text areas you get good output. E.g.
>>>
>>> [image: r_cropped.png]
>>>
>>> > tesseract r_cropped.png - --dpi 300
>>>
>>> Rascal Does Not Dream
>>> of Bunny Girl Senpai
>>>
>>> Zdenko
>>>
>>>
>>> st 29. 12. 2021 o 18:21 Cyrus Yip <cyrus...@gmail.com> napísal(a):
>>>
>>>> here is an example of an image i would like to use ocr on:
>>>> [image: drop8.png]
>>>> I would like the results to be like:
>>>> ["Naruto Uzumaki Naruto", "Mai Sakurajima Rascal Does Not Dream of 
>>>> Bunny Girl Senpai", "Keqing Genshin Impact"]
>>>>
>>>> Right now I'm using
>>>> region1 = im.crop((0, 55, im.width, 110))
>>>> region2 = im.crop((0, 312, im.width, 360))
>>>> image = Image.new("RGB", (im.width, region1.height + region2.height + 
>>>> 20))
>>>> image.paste(region1)
>>>> image.paste(region2, (0, region1.height + 20))
>>>> results = pytesseract.image_to_data(image, 
>>>> output_type=pytesseract.Output.DICT)
>>>>
>>>>
>>>> the processed image looks like
>>>> [image: hi.png]
>>>> but getting results like:
>>>> [' ', '»MaiSakurajima¥RascalDoesNotDreamofBunnyGirlSenpai', 
>>>> 'iGenshinImpact']
>>>>
>>>> How do I optimize the image/configs so the ocr is more accurate?
>>>>
>>>> Thank you.
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to tesseract-oc...@googlegroups.com.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/1a2fa0e4-b998-4931-ad7d-ae069a46568bn%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/1a2fa0e4-b998-4931-ad7d-ae069a46568bn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com.
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/3c60a0fd-a213-4caa-8a0d-6888a116b08an%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/3c60a0fd-a213-4caa-8a0d-6888a116b08an%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/8d80ed59-6163-48c9-adb8-975d8274a9adn%40googlegroups.com.

Re: [tesseract-ocr] bad quality!?

Reply via email to