but won't multiple ocr's and crops use a lot of time? On Wednesday, December 29, 2021 at 10:15:26 AM UTC-8 zdenop wrote:
> IMO if the text is always in the same area, cropping and OCR just that > area will be faster. > > Zdenko > > > st 29. 12. 2021 o 18:58 Cyrus Yip <cyrus...@gmail.com> napísal(a): > >> I played around a bit and replacing all colours except for text colour >> and it works pretty well! >> >> The only thing is replacing colours with: >> im = im.convert("RGB") >> pixdata = im.load() >> for y in range(im.height): >> for x in range(im.width): >> if pixdata[x, y] != (51, 51, 51): >> pixdata[x, y] = (255, 255, 255) >> is a bit slow. Do you know a better way to replace pixels in python? I >> don't know if this is off topic. >> On Wednesday, December 29, 2021 at 9:46:13 AM UTC-8 zdenop wrote: >> >>> If you properly crop text areas you get good output. E.g. >>> >>> [image: r_cropped.png] >>> >>> > tesseract r_cropped.png - --dpi 300 >>> >>> Rascal Does Not Dream >>> of Bunny Girl Senpai >>> >>> Zdenko >>> >>> >>> st 29. 12. 2021 o 18:21 Cyrus Yip <cyrus...@gmail.com> napísal(a): >>> >>>> here is an example of an image i would like to use ocr on: >>>> [image: drop8.png] >>>> I would like the results to be like: >>>> ["Naruto Uzumaki Naruto", "Mai Sakurajima Rascal Does Not Dream of >>>> Bunny Girl Senpai", "Keqing Genshin Impact"] >>>> >>>> Right now I'm using >>>> region1 = im.crop((0, 55, im.width, 110)) >>>> region2 = im.crop((0, 312, im.width, 360)) >>>> image = Image.new("RGB", (im.width, region1.height + region2.height + >>>> 20)) >>>> image.paste(region1) >>>> image.paste(region2, (0, region1.height + 20)) >>>> results = pytesseract.image_to_data(image, >>>> output_type=pytesseract.Output.DICT) >>>> >>>> >>>> the processed image looks like >>>> [image: hi.png] >>>> but getting results like: >>>> [' ', '»MaiSakurajima¥RascalDoesNotDreamofBunnyGirlSenpai', >>>> 'iGenshinImpact'] >>>> >>>> How do I optimize the image/configs so the ocr is more accurate? >>>> >>>> Thank you. >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to tesseract-oc...@googlegroups.com. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/1a2fa0e4-b998-4931-ad7d-ae069a46568bn%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/1a2fa0e4-b998-4931-ad7d-ae069a46568bn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/3c60a0fd-a213-4caa-8a0d-6888a116b08an%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/3c60a0fd-a213-4caa-8a0d-6888a116b08an%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/8d80ed59-6163-48c9-adb8-975d8274a9adn%40googlegroups.com.