Hi Lorenzo, thank you so much for your ideas! Unfortunately, I don't think I can get a better image quality. It's a VGA signal that's being grabbed, and well, that's the result. Maybe I'll try a different converter. I did some more tests, too, and the only way I found to get a little better results is to segment the image manually and then feed the individual segments into tesseract. My problem is, that I need to rely on the results (perhaps not 99%, but at least 90%), and that sounds pretty hard to achieve.
Greetings, Chris On Sunday, November 13, 2022 at 1:41:36 PM UTC+1 Lorenzo Blz wrote: > I did it by hand with Gimp. > > The code depends on what you know about the image. If it is fixed size and > fixed location you can easily do this, for example, with python and opencv: > crop, invert header, two different thresholds. > > If the size/alignment are not fixed you could use SIFT to align the image > with a fixed template (or use Hough lines to rotate it or something similar > if there is not a lot of perspective correction to do). > > If it is aligned but not fixed size, you can detect the darkest part with > threshold and findContours (with open/close/erode to clean the image) or in > simpler ways, it really depends how much the gray tones changes between > frames. You could do a floodFill in a few know locations of the header with > a different color and find the contours for this colored region (and use > the rectangle rotation to rotate the image, if needed) > > It may take a few hours of a few days depending on the images. > > > Bye > > Lorenzo > > > > Il giorno dom 13 nov 2022 alle ore 13:27 Mehmet Furkan < > bakirmeh...@gmail.com> ha scritto: > >> Waaw, good job! Could you share the source code of this ocr? If that's >> okay, I'll be really happy. >> >> On Sunday, 13 November 2022 at 14:15:17 UTC+3 Lorenzo Blz wrote: >> >>> Hi Chris, >>> you should try to get something like this: >>> >>> [image: temp2b.jpg] >>> >>> >>> >>> I inverted the headers section and then did two different threshold on >>> each part. If you are not interested in the titles you can just crop them >>> out. >>> >>> The image is blurry, maybe it was upscaled a little? If so, try >>> different levels of upscale, probably better if full integers like 2x, 3x, >>> etc. to see if it improves. Or see if other frames from the video might be >>> better or improve the video capture (resolution, lighting, frame rate, >>> etc.). >>> >>> This is what I get: >>> >>> Modes Dunchieachiungszet >>> >>> = ro >>> oF wn [3 >>> HF omen | mm >>> Gesamt 00s 0% >>> >>> quite unusable but at least it is starting to find something. >>> >>> >>> I think training will help IF all your images have this kind of blurry >>> text and you use actual crops from these images for training. >>> >>> >>> Bye >>> >>> Lorenzo >>> >>> Il giorno sab 12 nov 2022 alle ore 18:57 Chris E. <goaf...@gmail.com> >>> ha scritto: >>> >>>> Hi, >>>> >>>> I want to OCR this kind of image, which is from a video grabber, >>>> unfortunately of pretty bad quality. With the default options of >>>> tesseract, >>>> it's pretty useless. >>>> Before I start digging deeper into training tesseract, I would love to >>>> hear some recommendations. Would it be possible to achieve a good result >>>> from this kind of image with proper training? >>>> Any further ideas/tips would be appreciated! >>>> >>>> Greetings, >>>> Chris >>>> >>>> [image: temp2.jpg] >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to tesseract-oc...@googlegroups.com. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/edf2898d-e442-46a5-bf0c-46f38561c20en%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/edf2898d-e442-46a5-bf0c-46f38561c20en%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/c1c375a2-2581-4230-9997-235e210fa7acn%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/c1c375a2-2581-4230-9997-235e210fa7acn%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/33480348-6df9-4a36-8c0d-ea09fd5e5734n%40googlegroups.com.