I did it by hand with Gimp. The code depends on what you know about the image. If it is fixed size and fixed location you can easily do this, for example, with python and opencv: crop, invert header, two different thresholds.
If the size/alignment are not fixed you could use SIFT to align the image with a fixed template (or use Hough lines to rotate it or something similar if there is not a lot of perspective correction to do). If it is aligned but not fixed size, you can detect the darkest part with threshold and findContours (with open/close/erode to clean the image) or in simpler ways, it really depends how much the gray tones changes between frames. You could do a floodFill in a few know locations of the header with a different color and find the contours for this colored region (and use the rectangle rotation to rotate the image, if needed) It may take a few hours of a few days depending on the images. Bye Lorenzo Il giorno dom 13 nov 2022 alle ore 13:27 Mehmet Furkan < bakirmehmetfur...@gmail.com> ha scritto: > Waaw, good job! Could you share the source code of this ocr? If that's > okay, I'll be really happy. > > On Sunday, 13 November 2022 at 14:15:17 UTC+3 Lorenzo Blz wrote: > >> Hi Chris, >> you should try to get something like this: >> >> [image: temp2b.jpg] >> >> >> >> I inverted the headers section and then did two different threshold on >> each part. If you are not interested in the titles you can just crop them >> out. >> >> The image is blurry, maybe it was upscaled a little? If so, try different >> levels of upscale, probably better if full integers like 2x, 3x, etc. to >> see if it improves. Or see if other frames from the video might be better >> or improve the video capture (resolution, lighting, frame rate, etc.). >> >> This is what I get: >> >> Modes Dunchieachiungszet >> >> = ro >> oF wn [3 >> HF omen | mm >> Gesamt 00s 0% >> >> quite unusable but at least it is starting to find something. >> >> >> I think training will help IF all your images have this kind of blurry >> text and you use actual crops from these images for training. >> >> >> Bye >> >> Lorenzo >> >> Il giorno sab 12 nov 2022 alle ore 18:57 Chris E. <goaf...@gmail.com> ha >> scritto: >> >>> Hi, >>> >>> I want to OCR this kind of image, which is from a video grabber, >>> unfortunately of pretty bad quality. With the default options of tesseract, >>> it's pretty useless. >>> Before I start digging deeper into training tesseract, I would love to >>> hear some recommendations. Would it be possible to achieve a good result >>> from this kind of image with proper training? >>> Any further ideas/tips would be appreciated! >>> >>> Greetings, >>> Chris >>> >>> [image: temp2.jpg] >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/edf2898d-e442-46a5-bf0c-46f38561c20en%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/edf2898d-e442-46a5-bf0c-46f38561c20en%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/c1c375a2-2581-4230-9997-235e210fa7acn%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/c1c375a2-2581-4230-9997-235e210fa7acn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLzyyKDj0jVfs94EJg56G%2BO0oDbmMipz0N4w1aE%2Bh9Mprw%40mail.gmail.com.