Sv: Issue with > 200% CPU after bulk usage

2020-04-16 Thread hans.meijer
Thanks Nick, I will do that, but unfortunately since the Tika occupied all CPU and stopped all other processing, i had to restart it, that solved the issue but i restarted all and i can imagine it will happen again. When it does i will do a thread dump of it. I will also investigate if it was som

Re: [EXTERNAL] Re: Issue with > 200% CPU after bulk usage

2020-04-16 Thread Tim Allison
I very much like Eric's ideas of recipes and possibly code because of the differences in capabilities available via the various cloud providers. On Thu, Apr 16, 2020 at 10:11 AM Chris Mattmann wrote: > Yes, some of us have been developing an Elastic scaling stack for Tika > server… > > > > That

Re: [EXTERNAL] Re: Issue with > 200% CPU after bulk usage

2020-04-16 Thread Chris Mattmann
Yes, some of us have been developing an Elastic scaling stack for Tika server… That does just that with AWS. Don’t have it ready to push upstream yet. Cheers, Chris From: Eric Pugh Reply-To: "dev@tika.apache.org" Date: Thursday, April 16, 2020 at 7:09 AM To: "dev@tika.apache.org" S

Re: Issue with > 200% CPU after bulk usage

2020-04-16 Thread Eric Pugh
Does anyone have a good example of combining Tika with some sort of pool of Docker containers? I think a lot of folks treat their Tika server like a pet, not like a cow. https://cloudscaling.com/blog/cloud-computing/the-history-of-pets-vs-cattle/

Re: Issue with > 200% CPU after bulk usage

2020-04-16 Thread Nick Burch
On Wed, 15 Apr 2020, hans.mei...@avident-it.se wrote: I have encountered an issue with Tika running locally on a box that the Java runtime goes up to over 200% CPU, after running a bulk load of documents over a couple of days, it is more than 3 million documents. Can you do a thread dump to sh

Re: Issue with > 200% CPU after bulk usage

2020-04-16 Thread Tim Allison
In short, are you running tika-server in --spawnChild mode? You can set the max number of files to process before it restarts the child process...this prevents slow building memory leaks, and it will restart the child if one of the threads hits an infinite loop. On Wed, Apr 15, 2020 at 11:16 AM

Re: Issue with > 200% CPU after bulk usage

2020-04-16 Thread Tim Allison
Hi Hans, You inspired me to document my thoughts on this: https://cwiki.apache.org/confluence/display/TIKA/The+Robustness+of+Apache+Tika Please let us know if you have any questions. Best, Tim On Wed, Apr 15, 2020 at 11:16 AM wrote: > Hi > > I have encountered an issue with