IMO if the text is always in the same area, cropping and OCR just that area
will be faster.

Zdenko


st 29. 12. 2021 o 18:58 Cyrus Yip <cyruscmy...@gmail.com> napísal(a):

> I played around a bit and replacing all colours except for text colour and
> it works pretty well!
>
> The only thing is replacing colours with:
> im = im.convert("RGB")
> pixdata = im.load()
> for y in range(im.height):
>     for x in range(im.width):
>         if pixdata[x, y] != (51, 51, 51):
>             pixdata[x, y] = (255, 255, 255)
> is a bit slow. Do you know a better way to replace pixels in python? I
> don't know if this is off topic.
> On Wednesday, December 29, 2021 at 9:46:13 AM UTC-8 zdenop wrote:
>
>> If you properly crop text areas you get good output. E.g.
>>
>> [image: r_cropped.png]
>>
>> > tesseract r_cropped.png - --dpi 300
>>
>> Rascal Does Not Dream
>> of Bunny Girl Senpai
>>
>> Zdenko
>>
>>
>> st 29. 12. 2021 o 18:21 Cyrus Yip <cyrus...@gmail.com> napísal(a):
>>
>>> here is an example of an image i would like to use ocr on:
>>> [image: drop8.png]
>>> I would like the results to be like:
>>> ["Naruto Uzumaki Naruto", "Mai Sakurajima Rascal Does Not Dream of Bunny
>>> Girl Senpai", "Keqing Genshin Impact"]
>>>
>>> Right now I'm using
>>> region1 = im.crop((0, 55, im.width, 110))
>>> region2 = im.crop((0, 312, im.width, 360))
>>> image = Image.new("RGB", (im.width, region1.height + region2.height +
>>> 20))
>>> image.paste(region1)
>>> image.paste(region2, (0, region1.height + 20))
>>> results = pytesseract.image_to_data(image,
>>> output_type=pytesseract.Output.DICT)
>>>
>>>
>>> the processed image looks like
>>> [image: hi.png]
>>> but getting results like:
>>> [' ', '»MaiSakurajima¥RascalDoesNotDreamofBunnyGirlSenpai',
>>> 'iGenshinImpact']
>>>
>>> How do I optimize the image/configs so the ocr is more accurate?
>>>
>>> Thank you.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/1a2fa0e4-b998-4931-ad7d-ae069a46568bn%40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/1a2fa0e4-b998-4931-ad7d-ae069a46568bn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/3c60a0fd-a213-4caa-8a0d-6888a116b08an%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/3c60a0fd-a213-4caa-8a0d-6888a116b08an%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wjJOwkqnR50%2BFrCUq8M23RUpMdF3RBihKBCAMcCDdndw%40mail.gmail.com.

Reply via email to