I played around a bit and replacing all colours except for text colour and 
it works pretty well!

The only thing is replacing colours with:
im = im.convert("RGB")
pixdata = im.load()
for y in range(im.height):
    for x in range(im.width):
        if pixdata[x, y] != (51, 51, 51):
            pixdata[x, y] = (255, 255, 255)
is a bit slow. Do you know a better way to replace pixels in python? I 
don't know if this is off topic.
On Wednesday, December 29, 2021 at 9:46:13 AM UTC-8 zdenop wrote:

> If you properly crop text areas you get good output. E.g.
>
> [image: r_cropped.png]
>
> > tesseract r_cropped.png - --dpi 300
>
> Rascal Does Not Dream
> of Bunny Girl Senpai
>
> Zdenko
>
>
> st 29. 12. 2021 o 18:21 Cyrus Yip <cyrus...@gmail.com> napísal(a):
>
>> here is an example of an image i would like to use ocr on:
>> [image: drop8.png]
>> I would like the results to be like:
>> ["Naruto Uzumaki Naruto", "Mai Sakurajima Rascal Does Not Dream of Bunny 
>> Girl Senpai", "Keqing Genshin Impact"]
>>
>> Right now I'm using
>> region1 = im.crop((0, 55, im.width, 110))
>> region2 = im.crop((0, 312, im.width, 360))
>> image = Image.new("RGB", (im.width, region1.height + region2.height + 20))
>> image.paste(region1)
>> image.paste(region2, (0, region1.height + 20))
>> results = pytesseract.image_to_data(image, 
>> output_type=pytesseract.Output.DICT)
>>
>>
>> the processed image looks like
>> [image: hi.png]
>> but getting results like:
>> [' ', '»MaiSakurajima¥RascalDoesNotDreamofBunnyGirlSenpai', 
>> 'iGenshinImpact']
>>
>> How do I optimize the image/configs so the ocr is more accurate?
>>
>> Thank you.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/1a2fa0e4-b998-4931-ad7d-ae069a46568bn%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/1a2fa0e4-b998-4931-ad7d-ae069a46568bn%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/3c60a0fd-a213-4caa-8a0d-6888a116b08an%40googlegroups.com.

Reply via email to