Re: [tesseract-ocr] Difficult image, any tips would be appreciated

2022-11-13 Thread Lorenzo Bolzani
Hi Chris, you should try to get something like this: [image: temp2b.jpg] I inverted the headers section and then did two different threshold on each part. If you are not interested in the titles you can just crop them out. The image is blurry, maybe it was upscaled a little? If so, try differe

Re: [tesseract-ocr] Difficult image, any tips would be appreciated

2022-11-13 Thread Mehmet Furkan
Waaw, good job! Could you share the source code of this ocr? If that's okay, I'll be really happy. On Sunday, 13 November 2022 at 14:15:17 UTC+3 Lorenzo Blz wrote: > Hi Chris, > you should try to get something like this: > > [image: temp2b.jpg] > > > > I inverted the headers section and then did

Re: [tesseract-ocr] Difficult image, any tips would be appreciated

2022-11-13 Thread Lorenzo Bolzani
I did it by hand with Gimp. The code depends on what you know about the image. If it is fixed size and fixed location you can easily do this, for example, with python and opencv: crop, invert header, two different thresholds. If the size/alignment are not fixed you could use SIFT to align the ima

Re: [tesseract-ocr] Difficult image, any tips would be appreciated

2022-11-13 Thread Chris E.
Hi Lorenzo, thank you so much for your ideas! Unfortunately, I don't think I can get a better image quality. It's a VGA signal that's being grabbed, and well, that's the result. Maybe I'll try a different converter. I did some more tests, too, and the only way I found to get a little better res

Re: [tesseract-ocr] Difficult image, any tips would be appreciated

2022-11-13 Thread Chris E.
BTW, Google Lens detects ALL text on the image perfectly ;) On Sunday, November 13, 2022 at 3:10:51 PM UTC+1 Chris E. wrote: > Hi Lorenzo, > > thank you so much for your ideas! Unfortunately, I don't think I can get a > better image quality. It's a VGA signal that's being grabbed, and well,

[tesseract-ocr] Re: Difficult image, any tips would be appreciated

2022-11-13 Thread Tom Morris
The image has "mosquito noise" around the characters which indicates that it's been compressed with JPEG or similar algorithm. You should definitely try to avoid any compression at this low a resolution. I think your idea of investigating different video capture devices is a good one. It looks

[tesseract-ocr] Re: Difficult image, any tips would be appreciated

2022-11-13 Thread Chris E.
Hi Tom, the compression artifacts are of course easy to avoid, but the “ghosting” in the image is definitely a severe problem. I noticed that, too, but I have no idea what the reason could be. Again, a different AD converter could help. I already tried to clean the “ghosting”, but had no succes