Re: [tesseract-ocr] How to optimize tesseract to maximum speed for single number (several digits) recognition

Zdenko Podobny Tue, 29 Jan 2019 22:58:07 -0800

search issue tracker for "speed"...

Zdenko



st 30. 1. 2019 o 7:51 Jan Pohanka <[email protected]> napísal(a):

> It is 4.0. I'm satisfied with recognition results, but I need to make it
> faster (at constant times below 1s)...
>
> Dne středa 30. ledna 2019 7:48:23 UTC+1 zdenop napsal(a):
>>
>> What is your tesseract version?
>>
>> Zdenko
>>
>>
>> ut 29. 1. 2019 o 20:40 Jan Pohanka <[email protected]> napísal(a):
>>
>>> Thanks for suggestions. You are right that I'm reffering to 
>>> api.GetUTF8Text()
>>> call, it is my bottleneck.
>>> I was not aware that there is some fast and best models in tesseract
>>> 4.0, I will give it a try. So far I used just lang=eng or osd.
>>> For me it is suspicious that the calls are getting longer during the
>>> time. Or to be more precise, first 10-15 calls are up to 500ms and latter
>>> ones rise above 1s...
>>> SetSourceResolution outside of the loop gives no change unfortunately.
>>>
>>> BR
>>> Jan
>>>
>>> Dne úterý 29. ledna 2019 18:08:49 UTC+1 Lorenzo Blz napsal(a):
>>>>
>>>>
>>>> First double check if the Pi is not throttling due to overheating or
>>>> lack of USB power. This may cause the slowdown.
>>>>
>>>> Usually 30/50 px of text height is fine. IF the problem is tesseract,
>>>> try to use the fast model (or "normal" if using best). I assume you are
>>>> using the 4.x release.
>>>>
>>>> Try tesseract -v to see if you are using all the available CPU
>>>> optimizations.
>>>>
>>>> Try to move the SetSourceResolution outside the loop and see if it
>>>> changes something (MAYBE it may invalidate some caches or something).
>>>>
>>>> The time you are referring to is one single api.GetUTF8Text() call,
>>>> correct?
>>>>
>>>>
>>>> Lorenzo
>>>>
>>>>
>>>> Il giorno mar 29 gen 2019 alle ore 17:48 Jan Pohanka <[email protected]>
>>>> ha scritto:
>>>>
>>>>> Hello,
>>>>>
>>>>> I'm making a simple device used to recognize numbers on pictures taken
>>>>> by a webcam. All is running on raspberry pi 3.
>>>>> Everything is like following simple loop (in python for simplicity,
>>>>> but using C++ api it is the same), images are preprocessed to black and
>>>>> white
>>>>>
>>>>> api = PyTessBaseAPI(psm=tesserocr.PSM.SINGLE_WORD)
>>>>>
>>>>> for im in images:
>>>>>     api.SetImage(im)
>>>>>     api.SetSourceResolution(70)
>>>>>     ot = api.GetUTF8Text()
>>>>>
>>>>> api.End()
>>>>>
>>>>>
>>>>> My problem is that api.GetUTF8Text() call is quite slow and more over
>>>>> it is getting slower and slower over time. Is there any options how to 
>>>>> make
>>>>> recognition faster? I have tried to resize the image to around 50x10px. 
>>>>> The
>>>>> times starts on around 300ms but then goes up to above 1s which is too 
>>>>> slow
>>>>> for me. I tried both legacy and LSTM algorithms, but they are similar.
>>>>>
>>>>> best regards
>>>>> Jan
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/a53b4b25-97e3-47dc-823a-cbb219225eed%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/a53b4b25-97e3-47dc-823a-cbb219225eed%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/baa59c86-b002-4607-8dda-16835cd3ea73%40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/baa59c86-b002-4607-8dda-16835cd3ea73%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/dedb7fd8-d61e-42bb-a492-34beaa8b1514%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/dedb7fd8-d61e-42bb-a492-34beaa8b1514%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xJGEqU5ARtJOOcXq72Cdn50%2B%2Bi3dqpEfiKK6i0iJeY7Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] How to optimize tesseract to maximum speed for single number (several digits) recognition

Reply via email to