Re: [tesseract-ocr] Difficult image, any tips would be appreciated

Chris E. Sun, 13 Nov 2022 06:25:16 -0800

BTW, Google Lens detects ALL text on the image perfectly.... ;) 

On Sunday, November 13, 2022 at 3:10:51 PM UTC+1 Chris E. wrote:


> Hi Lorenzo,
>
> thank you so much for your ideas! Unfortunately, I don't think I can get a 
> better image quality. It's a VGA signal that's being grabbed, and well, 
> that's the result. Maybe I'll try a different converter.
> I did some more tests, too, and the only way I found to get a little 
> better results is to segment the image manually and then feed the 
> individual segments into tesseract. My problem is, that I need to rely on 
> the results (perhaps not 99%, but at least 90%), and that sounds pretty 
> hard to achieve. 
>
> Greetings, 
> Chris
>
>
>
>
> On Sunday, November 13, 2022 at 1:41:36 PM UTC+1 Lorenzo Blz wrote:
>
>> I did it by hand with Gimp.
>>
>> The code depends on what you know about the image. If it is fixed size 
>> and fixed location you can easily do this, for example, with python and 
>> opencv: crop, invert header, two different thresholds.
>>
>> If the size/alignment are not fixed you could use SIFT to align the image 
>> with a fixed template (or use Hough lines to rotate it or something similar 
>> if there is not a lot of perspective correction to do).
>>
>> If it is aligned but not fixed size, you can detect the darkest part with 
>> threshold and findContours (with open/close/erode to clean the image) or in 
>> simpler ways, it really depends how much the gray tones changes between 
>> frames. You could do a floodFill in a few know locations of the header with 
>> a different color and find the contours for this colored region (and use 
>> the rectangle rotation to rotate the image, if needed)
>>
>> It may take a few hours of a few days depending on the images.
>>
>>
>> Bye
>>
>> Lorenzo
>>
>>
>>
>> Il giorno dom 13 nov 2022 alle ore 13:27 Mehmet Furkan <
>> bakirmeh...@gmail.com> ha scritto:
>>
>>> Waaw, good job! Could you share the source code of this ocr? If that's 
>>> okay, I'll be really happy.
>>>
>>> On Sunday, 13 November 2022 at 14:15:17 UTC+3 Lorenzo Blz wrote:
>>>
>>>> Hi Chris,
>>>> you should try to get something like this:
>>>>
>>>> [image: temp2b.jpg]
>>>>
>>>>
>>>>
>>>> I inverted the headers section and then did two different threshold on 
>>>> each part. If you are not interested in the titles you can just crop them 
>>>> out.
>>>>
>>>> The image is blurry, maybe it was upscaled a little? If so, try 
>>>> different levels of upscale, probably better if full integers like 2x, 3x, 
>>>> etc. to see if it improves. Or see if other frames from the video might be 
>>>> better or improve the video capture (resolution, lighting, frame rate, 
>>>> etc.).
>>>>
>>>> This is what I get:
>>>>
>>>> Modes Dunchieachiungszet
>>>>
>>>> = ro
>>>> oF wn [3
>>>> HF omen | mm
>>>> Gesamt 00s 0%
>>>>
>>>> quite unusable but at least it is starting to find something.
>>>>
>>>>
>>>> I think training will help IF all your images have this kind of blurry 
>>>> text and you use actual crops from these images for training.
>>>>
>>>>
>>>> Bye
>>>>
>>>> Lorenzo
>>>>
>>>> Il giorno sab 12 nov 2022 alle ore 18:57 Chris E. <goaf...@gmail.com> 
>>>> ha scritto:
>>>>
>>>>> Hi,
>>>>>
>>>>> I want to OCR this kind of image, which is from a video grabber, 
>>>>> unfortunately of pretty bad quality. With the default options of 
>>>>> tesseract, 
>>>>> it's pretty useless.
>>>>> Before I start digging deeper into training tesseract, I would love to 
>>>>> hear some recommendations. Would it be possible to achieve a good result 
>>>>> from this kind of image with proper training?
>>>>> Any further ideas/tips would be appreciated!
>>>>>
>>>>> Greetings,
>>>>> Chris
>>>>>
>>>>> [image: temp2.jpg]
>>>>>
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to tesseract-oc...@googlegroups.com.
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/edf2898d-e442-46a5-bf0c-46f38561c20en%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/edf2898d-e442-46a5-bf0c-46f38561c20en%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to tesseract-oc...@googlegroups.com.
>>>
>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/c1c375a2-2581-4230-9997-235e210fa7acn%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/tesseract-ocr/c1c375a2-2581-4230-9997-235e210fa7acn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/31e73ed5-b86a-4c95-bf06-42803c55434bn%40googlegroups.com.

Re: [tesseract-ocr] Difficult image, any tips would be appreciated

Reply via email to