In short, are you running tika-server in --spawnChild mode?  You can set
the max number of files to process before it restarts the child
process...this prevents slow building memory leaks, and it will restart the
child if one of the threads hits an infinite loop.

On Wed, Apr 15, 2020 at 11:16 AM <hans.mei...@avident-it.se> wrote:

> Hi
>
> I have encountered an issue with Tika running locally on a box that the
> Java runtime goes up to over 200% CPU, after running a bulk load of
> documents over a couple of days, it is more than 3 million documents.
>
> But memory consumption is not an issue it seems like.
>
>
>
> I had 3 processes running against it processing various documents.
>
>
>
> It got stalled and went up to over 200% CPU on the Java process.
>
> It got ok after restarting the tika server.
>
>
>
> Are there any known issues with CPU spots that it stalls at over 200% and
> seems not to get back in processing?
>
> If so, are there any configuration issues that could be adjusted for
> startup (Java heap, etc.)?
>
>
>
> I could not find specific logs to attach, but if there are any that could
> interesting to see, let me know.
>
>
>
> Details:
>
>
>
> Tika version is 1.4
>
> I enclose the xml configuration file.
>
>
>
> It is running on a debian system (stretch), single node:
>
>
>
> Linux 4.9.0-9-amd64 #1 SMP Debian 4.9.168-1+deb9u5 (2019-08-11) x86_64
> GNU/Linux
>
> Distributor ID: Debian
>
> Description:    Debian GNU/Linux 9.9 (stretch)
>
> Release:        9.9
>
> Codename:       stretch
>
>
>
> MemTotal:        4032120 kB
>
>
>
>
>
> Kind regards
>
> Hans
>

Reply via email to