I played around a bit and replacing all colours except for text colour and it works pretty well!
The only thing is replacing colours with: im = im.convert("RGB") pixdata = im.load() for y in range(im.height): for x in range(im.width): if pixdata[x, y] != (51, 51, 51): pixdata[x, y] = (255, 255, 255) is a bit slow. Do you know a better way to replace pixels in python? I don't know if this is off topic. On Wednesday, December 29, 2021 at 9:46:13 AM UTC-8 zdenop wrote: > If you properly crop text areas you get good output. E.g. > > [image: r_cropped.png] > > > tesseract r_cropped.png - --dpi 300 > > Rascal Does Not Dream > of Bunny Girl Senpai > > Zdenko > > > st 29. 12. 2021 o 18:21 Cyrus Yip <cyrus...@gmail.com> napísal(a): > >> here is an example of an image i would like to use ocr on: >> [image: drop8.png] >> I would like the results to be like: >> ["Naruto Uzumaki Naruto", "Mai Sakurajima Rascal Does Not Dream of Bunny >> Girl Senpai", "Keqing Genshin Impact"] >> >> Right now I'm using >> region1 = im.crop((0, 55, im.width, 110)) >> region2 = im.crop((0, 312, im.width, 360)) >> image = Image.new("RGB", (im.width, region1.height + region2.height + 20)) >> image.paste(region1) >> image.paste(region2, (0, region1.height + 20)) >> results = pytesseract.image_to_data(image, >> output_type=pytesseract.Output.DICT) >> >> >> the processed image looks like >> [image: hi.png] >> but getting results like: >> [' ', '»MaiSakurajima¥RascalDoesNotDreamofBunnyGirlSenpai', >> 'iGenshinImpact'] >> >> How do I optimize the image/configs so the ocr is more accurate? >> >> Thank you. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/1a2fa0e4-b998-4931-ad7d-ae069a46568bn%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/1a2fa0e4-b998-4931-ad7d-ae069a46568bn%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3c60a0fd-a213-4caa-8a0d-6888a116b08an%40googlegroups.com.