Hi Hans, You inspired me to document my thoughts on this: https://cwiki.apache.org/confluence/display/TIKA/The+Robustness+of+Apache+Tika Please let us know if you have any questions.
Best, Tim On Wed, Apr 15, 2020 at 11:16 AM <hans.mei...@avident-it.se> wrote: > Hi > > I have encountered an issue with Tika running locally on a box that the > Java runtime goes up to over 200% CPU, after running a bulk load of > documents over a couple of days, it is more than 3 million documents. > > But memory consumption is not an issue it seems like. > > > > I had 3 processes running against it processing various documents. > > > > It got stalled and went up to over 200% CPU on the Java process. > > It got ok after restarting the tika server. > > > > Are there any known issues with CPU spots that it stalls at over 200% and > seems not to get back in processing? > > If so, are there any configuration issues that could be adjusted for > startup (Java heap, etc.)? > > > > I could not find specific logs to attach, but if there are any that could > interesting to see, let me know. > > > > Details: > > > > Tika version is 1.4 > > I enclose the xml configuration file. > > > > It is running on a debian system (stretch), single node: > > > > Linux 4.9.0-9-amd64 #1 SMP Debian 4.9.168-1+deb9u5 (2019-08-11) x86_64 > GNU/Linux > > Distributor ID: Debian > > Description: Debian GNU/Linux 9.9 (stretch) > > Release: 9.9 > > Codename: stretch > > > > MemTotal: 4032120 kB > > > > > > Kind regards > > Hans >