Hi, all,

Long time user, first time caller... I've been using Solr on and off since 2008.

We've identified a potential resource leak in the task management subsystem 
that we believe is the cause of crashes of long running nodes. Before raising a 
bug I thought I should check with this list whether it was known or I'm barking 
up the wrong tree.

Essentially, whenever a query task is abnormally ended, ie either the client 
times out and closes the connection, the query hits the timeAllowed or 
cpuAllowed limit, or the task is cancelled through the 
/solr/collection/tasks/cancel?queryUUID= mechanism, the task is never or almost 
never removed from the list of tasks returned by the 
/v2/collections/collection/tasks/list endpoint.

We also suspect that other resources are not always returned in these 
circumstances, even after hours or days as the heap continues to grow in a way 
far, far greater than would be expected from the size of the task list. This 
leads to an ever-increasing number of tasks in the list, meaning that iterating 
it takes longer and longer and eventual slowdown such that the number of tasks 
waiting grows to the extent that the node becomes unresponsive and restarts. We 
also see inconsistent lists of tasks on each node as this happens.

As this is taking approx 2 months to become a problem on our prod nodes as we 
don't fail many transactions, I wrote a repro script against one of our 
collections and ran it on a deliberately CPU constrained SolrCloud locally with 
three nodes, each with 4GB of heap available and approx 1GB of data to 
deliberately cause tasks to time out. This resulted in over 10,000 entries in 
the task list, none of which were active, after I stopped the script. Leaving 
the nodes running for a further 12 hours saw no reduction in the number of 
listed tasks.

I've tried this both against the 9.4 we run in production and 9.8.0 just to see 
if it's improved, and although 9.8 is noticeably faster (nice work), the same 
thing happens.

Any ideas?

Thanks in advance,



Reuben







Reuben Thompson

VP Product Innovation


e:  reuben.thomp...@acresoftware.com<mailto:reuben.thomp...@acresoftware.com>

w: acresoftware.com



[cid:822934a0-d587-4f8a-9b24-0fd01d9068a6]


Reply via email to