Re: [tesseract-ocr] How to optimize tesseract to maximum speed for single number (several digits) recognition

Jan Pohanka Tue, 29 Jan 2019 22:52:20 -0800

It is 4.0. I'm satisfied with recognition results, but I need to make it 
faster (at constant times below 1s)...


Dne středa 30. ledna 2019 7:48:23 UTC+1 zdenop napsal(a):
>
> What is your tesseract version?
>
> Zdenko
>
>
> ut 29. 1. 2019 o 20:40 Jan Pohanka <[email protected] <javascript:>> 
> napísal(a):
>
>> Thanks for suggestions. You are right that I'm reffering to 
>> api.GetUTF8Text() 
>> call, it is my bottleneck.
>> I was not aware that there is some fast and best models in tesseract 4.0, 
>> I will give it a try. So far I used just lang=eng or osd.
>> For me it is suspicious that the calls are getting longer during the 
>> time. Or to be more precise, first 10-15 calls are up to 500ms and latter 
>> ones rise above 1s...
>> SetSourceResolution outside of the loop gives no change unfortunately.
>>
>> BR
>> Jan
>>
>> Dne úterý 29. ledna 2019 18:08:49 UTC+1 Lorenzo Blz napsal(a):
>>>
>>>
>>> First double check if the Pi is not throttling due to overheating or 
>>> lack of USB power. This may cause the slowdown.
>>>
>>> Usually 30/50 px of text height is fine. IF the problem is tesseract, 
>>> try to use the fast model (or "normal" if using best). I assume you are 
>>> using the 4.x release.
>>>
>>> Try tesseract -v to see if you are using all the available CPU 
>>> optimizations.
>>>
>>> Try to move the SetSourceResolution outside the loop and see if it 
>>> changes something (MAYBE it may invalidate some caches or something).
>>>
>>> The time you are referring to is one single api.GetUTF8Text() call, 
>>> correct?
>>>
>>>
>>> Lorenzo
>>>
>>>
>>> Il giorno mar 29 gen 2019 alle ore 17:48 Jan Pohanka <[email protected]> 
>>> ha scritto:
>>>
>>>> Hello,
>>>>
>>>> I'm making a simple device used to recognize numbers on pictures taken 
>>>> by a webcam. All is running on raspberry pi 3.
>>>> Everything is like following simple loop (in python for simplicity, but 
>>>> using C++ api it is the same), images are preprocessed to black and white
>>>>
>>>> api = PyTessBaseAPI(psm=tesserocr.PSM.SINGLE_WORD)
>>>>
>>>> for im in images:
>>>>     api.SetImage(im)
>>>>     api.SetSourceResolution(70)
>>>>     ot = api.GetUTF8Text()
>>>>
>>>> api.End()
>>>>
>>>>
>>>> My problem is that api.GetUTF8Text() call is quite slow and more over 
>>>> it is getting slower and slower over time. Is there any options how to 
>>>> make 
>>>> recognition faster? I have tried to resize the image to around 50x10px. 
>>>> The 
>>>> times starts on around 300ms but then goes up to above 1s which is too 
>>>> slow 
>>>> for me. I tried both legacy and LSTM algorithms, but they are similar.
>>>>
>>>> best regards
>>>> Jan
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/a53b4b25-97e3-47dc-823a-cbb219225eed%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/a53b4b25-97e3-47dc-823a-cbb219225eed%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/baa59c86-b002-4607-8dda-16835cd3ea73%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/baa59c86-b002-4607-8dda-16835cd3ea73%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/dedb7fd8-d61e-42bb-a492-34beaa8b1514%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] How to optimize tesseract to maximum speed for single number (several digits) recognition

Reply via email to