BTW, Google Lens detects ALL text on the image perfectly.... ;) On Sunday, November 13, 2022 at 3:10:51 PM UTC+1 Chris E. wrote:
> Hi Lorenzo, > > thank you so much for your ideas! Unfortunately, I don't think I can get a > better image quality. It's a VGA signal that's being grabbed, and well, > that's the result. Maybe I'll try a different converter. > I did some more tests, too, and the only way I found to get a little > better results is to segment the image manually and then feed the > individual segments into tesseract. My problem is, that I need to rely on > the results (perhaps not 99%, but at least 90%), and that sounds pretty > hard to achieve. > > Greetings, > Chris > > > > > On Sunday, November 13, 2022 at 1:41:36 PM UTC+1 Lorenzo Blz wrote: > >> I did it by hand with Gimp. >> >> The code depends on what you know about the image. If it is fixed size >> and fixed location you can easily do this, for example, with python and >> opencv: crop, invert header, two different thresholds. >> >> If the size/alignment are not fixed you could use SIFT to align the image >> with a fixed template (or use Hough lines to rotate it or something similar >> if there is not a lot of perspective correction to do). >> >> If it is aligned but not fixed size, you can detect the darkest part with >> threshold and findContours (with open/close/erode to clean the image) or in >> simpler ways, it really depends how much the gray tones changes between >> frames. You could do a floodFill in a few know locations of the header with >> a different color and find the contours for this colored region (and use >> the rectangle rotation to rotate the image, if needed) >> >> It may take a few hours of a few days depending on the images. >> >> >> Bye >> >> Lorenzo >> >> >> >> Il giorno dom 13 nov 2022 alle ore 13:27 Mehmet Furkan < >> bakirmeh...@gmail.com> ha scritto: >> >>> Waaw, good job! Could you share the source code of this ocr? If that's >>> okay, I'll be really happy. >>> >>> On Sunday, 13 November 2022 at 14:15:17 UTC+3 Lorenzo Blz wrote: >>> >>>> Hi Chris, >>>> you should try to get something like this: >>>> >>>> [image: temp2b.jpg] >>>> >>>> >>>> >>>> I inverted the headers section and then did two different threshold on >>>> each part. If you are not interested in the titles you can just crop them >>>> out. >>>> >>>> The image is blurry, maybe it was upscaled a little? If so, try >>>> different levels of upscale, probably better if full integers like 2x, 3x, >>>> etc. to see if it improves. Or see if other frames from the video might be >>>> better or improve the video capture (resolution, lighting, frame rate, >>>> etc.). >>>> >>>> This is what I get: >>>> >>>> Modes Dunchieachiungszet >>>> >>>> = ro >>>> oF wn [3 >>>> HF omen | mm >>>> Gesamt 00s 0% >>>> >>>> quite unusable but at least it is starting to find something. >>>> >>>> >>>> I think training will help IF all your images have this kind of blurry >>>> text and you use actual crops from these images for training. >>>> >>>> >>>> Bye >>>> >>>> Lorenzo >>>> >>>> Il giorno sab 12 nov 2022 alle ore 18:57 Chris E. <goaf...@gmail.com> >>>> ha scritto: >>>> >>>>> Hi, >>>>> >>>>> I want to OCR this kind of image, which is from a video grabber, >>>>> unfortunately of pretty bad quality. With the default options of >>>>> tesseract, >>>>> it's pretty useless. >>>>> Before I start digging deeper into training tesseract, I would love to >>>>> hear some recommendations. Would it be possible to achieve a good result >>>>> from this kind of image with proper training? >>>>> Any further ideas/tips would be appreciated! >>>>> >>>>> Greetings, >>>>> Chris >>>>> >>>>> [image: temp2.jpg] >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to tesseract-oc...@googlegroups.com. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/edf2898d-e442-46a5-bf0c-46f38561c20en%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/edf2898d-e442-46a5-bf0c-46f38561c20en%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com. >>> >> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/c1c375a2-2581-4230-9997-235e210fa7acn%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/tesseract-ocr/c1c375a2-2581-4230-9997-235e210fa7acn%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/31e73ed5-b86a-4c95-bf06-42803c55434bn%40googlegroups.com.