Re: [tesseract-ocr] Tesseract cannot read text on stripe background / but Google AI can

Ajinkya Bobade Fri, 04 Jun 2021 12:13:49 -0700

Following up: try uploading images of real world docs. Please avoid taking
photos of photos ( that is photos of computer screen which has documents).
Don't take photos of computer screen containing documents.  Capture real
document and upload them.


On Sat, Jun 5, 2021, 12:38 AM Ajinkya Bobade <[email protected]>
wrote:

> Hi Timo,
>
> Results are in low resolution because the image that you uploaded must be
> taken from sample set, this image that you uploaded is not taken from a
> real mobile phone camera.
>
> I recommend you to upload image captured from good quality phone camera
> and retry few more times with different images captured from phone camera.
> My software works poorly for sample images which are not real world. It
> works excellent for images in real world.
>
> Feel free to reach out to me if you have any questions or concerns.
>
> Regards
> Ajinkya
>
>
>
>
>
>
> On Thu, Jun 3, 2021, 4:38 PM Timo Richter <[email protected]> wrote:
>
>> Hi Ajinkya,
>>
>> the result looks better than mine. But it looks like a very low
>> resolution, the text is not readable. How did you do it?
>> Still the Google AI website is a lot more accurate. How can they have
>> done this?
>>
>>
>> [email protected] schrieb am Mittwoch, 2. Juni 2021 um 17:23:44 UTC+2:
>>
>>> Hello,
>>> I have created a web extension which solves this problem. Upload image
>>> to https://imagescanner-online.com/  it will clear your noise and
>>> pixel-segment text so that you get a very good quality input, which you can
>>> feed to tesseract and get good output
>>>
>>> Regards
>>> Ajinkya
>>>
>>> On Wed, Jun 2, 2021 at 12:13 AM Timo Richter <[email protected]> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> I have tried to ocr an identity card [1] and big parts were not
>>>> recognised. I do not get anything from the headline nor the first few rows.
>>>> From the middle, Tesseract partially finds correct text. There are lines
>>>> and things in the background, as usual. In the monochrome picture I could
>>>> not completely extract the letters from the background. Some gray pixels
>>>> stay there. But there is a website that does OCR and it works perfectly
>>>> [2]. Why do I get bad results and my Tesseract does not read the text? What
>>>> will the website do another way?
>>>>
>>>>
>>>> Thank you in advance,
>>>>
>>>> Timo
>>>>
>>>>
>>>> [1]
>>>> https://en.wikipedia.org/wiki/Philippine_passport#/media/File:Philippine_passport_(2016_edition)_data_page.jpg
>>>> (public domain)
>>>> [2] https://cloud.google.com/document-ai#section-2
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/tesseract-ocr/4f6d0261-5e0a-49c8-b6db-3e2b0e4ad9f5n%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/4f6d0261-5e0a-49c8-b6db-3e2b0e4ad9f5n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/9e83609b-1bad-4134-950a-025357e092b5n%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/9e83609b-1bad-4134-950a-025357e092b5n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAHy6iNN0Z1U5gXgxDCvOepa4Szb9tv4wt-qW6y7q%2Br8ci8iV6Q%40mail.gmail.com.

Re: [tesseract-ocr] Tesseract cannot read text on stripe background / but Google AI can

Reply via email to