Hi Hans,
  You inspired me to document my thoughts on this:
https://cwiki.apache.org/confluence/display/TIKA/The+Robustness+of+Apache+Tika
Please let us know if you have any questions.

      Best,

            Tim

On Wed, Apr 15, 2020 at 11:16 AM <hans.mei...@avident-it.se> wrote:

> Hi
>
> I have encountered an issue with Tika running locally on a box that the
> Java runtime goes up to over 200% CPU, after running a bulk load of
> documents over a couple of days, it is more than 3 million documents.
>
> But memory consumption is not an issue it seems like.
>
>
>
> I had 3 processes running against it processing various documents.
>
>
>
> It got stalled and went up to over 200% CPU on the Java process.
>
> It got ok after restarting the tika server.
>
>
>
> Are there any known issues with CPU spots that it stalls at over 200% and
> seems not to get back in processing?
>
> If so, are there any configuration issues that could be adjusted for
> startup (Java heap, etc.)?
>
>
>
> I could not find specific logs to attach, but if there are any that could
> interesting to see, let me know.
>
>
>
> Details:
>
>
>
> Tika version is 1.4
>
> I enclose the xml configuration file.
>
>
>
> It is running on a debian system (stretch), single node:
>
>
>
> Linux 4.9.0-9-amd64 #1 SMP Debian 4.9.168-1+deb9u5 (2019-08-11) x86_64
> GNU/Linux
>
> Distributor ID: Debian
>
> Description:    Debian GNU/Linux 9.9 (stretch)
>
> Release:        9.9
>
> Codename:       stretch
>
>
>
> MemTotal:        4032120 kB
>
>
>
>
>
> Kind regards
>
> Hans
>

Reply via email to