[
https://issues.apache.org/jira/browse/TIKA-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17284419#comment-17284419
]
Luís Filipe Nassif edited comment on TIKA-3300 at 2/18/21, 1:12 PM:
--------------------------------------------------------------------
I also set OMP_THREAD_LIMIT = 1 because my app is already multithreaded (ocr
many files simultaneously). That gave me about 2x-2.5x overall speed up. But if
the client app is monothreaded, I would use the default value, so tesseract
will use multiple threads to OCR each submitted file. Maybe just tika-server
should set this = 1 by default?
was (Author: lfcnassif):
I also set OMP_THREAD_LIMIT = 1 because my app is already multithreaded (ocr
many files simultaneously). That gave me about 2x-2.5x overall speed up. But if
the client app is monothreaded, I would use the default value, so tesseract
will use multiple threads to OCR each submitted file. Maybe tika-server and
tika-app should set this?
> Figure out if we can improve tesseract parallelization
> -------------------------------------------------------
>
> Key: TIKA-3300
> URL: https://issues.apache.org/jira/browse/TIKA-3300
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
>
> https://github.com/tesseract-ocr/tesseract/issues/2609
> https://twitter.com/jbaiter_/status/1360266497864704008?s=20
> Not sure if this affects us? h/t [~jbaiter]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)