IMO if the text is always in the same area, cropping and OCR just that area will be faster.
Zdenko st 29. 12. 2021 o 18:58 Cyrus Yip <cyruscmy...@gmail.com> napísal(a): > I played around a bit and replacing all colours except for text colour and > it works pretty well! > > The only thing is replacing colours with: > im = im.convert("RGB") > pixdata = im.load() > for y in range(im.height): > for x in range(im.width): > if pixdata[x, y] != (51, 51, 51): > pixdata[x, y] = (255, 255, 255) > is a bit slow. Do you know a better way to replace pixels in python? I > don't know if this is off topic. > On Wednesday, December 29, 2021 at 9:46:13 AM UTC-8 zdenop wrote: > >> If you properly crop text areas you get good output. E.g. >> >> [image: r_cropped.png] >> >> > tesseract r_cropped.png - --dpi 300 >> >> Rascal Does Not Dream >> of Bunny Girl Senpai >> >> Zdenko >> >> >> st 29. 12. 2021 o 18:21 Cyrus Yip <cyrus...@gmail.com> napísal(a): >> >>> here is an example of an image i would like to use ocr on: >>> [image: drop8.png] >>> I would like the results to be like: >>> ["Naruto Uzumaki Naruto", "Mai Sakurajima Rascal Does Not Dream of Bunny >>> Girl Senpai", "Keqing Genshin Impact"] >>> >>> Right now I'm using >>> region1 = im.crop((0, 55, im.width, 110)) >>> region2 = im.crop((0, 312, im.width, 360)) >>> image = Image.new("RGB", (im.width, region1.height + region2.height + >>> 20)) >>> image.paste(region1) >>> image.paste(region2, (0, region1.height + 20)) >>> results = pytesseract.image_to_data(image, >>> output_type=pytesseract.Output.DICT) >>> >>> >>> the processed image looks like >>> [image: hi.png] >>> but getting results like: >>> [' ', '»MaiSakurajima¥RascalDoesNotDreamofBunnyGirlSenpai', >>> 'iGenshinImpact'] >>> >>> How do I optimize the image/configs so the ocr is more accurate? >>> >>> Thank you. >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/1a2fa0e4-b998-4931-ad7d-ae069a46568bn%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/1a2fa0e4-b998-4931-ad7d-ae069a46568bn%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/3c60a0fd-a213-4caa-8a0d-6888a116b08an%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/3c60a0fd-a213-4caa-8a0d-6888a116b08an%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wjJOwkqnR50%2BFrCUq8M23RUpMdF3RBihKBCAMcCDdndw%40mail.gmail.com.