Hi, all, Long time user, first time caller... I've been using Solr on and off since 2008.
We've identified a potential resource leak in the task management subsystem that we believe is the cause of crashes of long running nodes. Before raising a bug I thought I should check with this list whether it was known or I'm barking up the wrong tree. Essentially, whenever a query task is abnormally ended, ie either the client times out and closes the connection, the query hits the timeAllowed or cpuAllowed limit, or the task is cancelled through the /solr/collection/tasks/cancel?queryUUID= mechanism, the task is never or almost never removed from the list of tasks returned by the /v2/collections/collection/tasks/list endpoint. We also suspect that other resources are not always returned in these circumstances, even after hours or days as the heap continues to grow in a way far, far greater than would be expected from the size of the task list. This leads to an ever-increasing number of tasks in the list, meaning that iterating it takes longer and longer and eventual slowdown such that the number of tasks waiting grows to the extent that the node becomes unresponsive and restarts. We also see inconsistent lists of tasks on each node as this happens. As this is taking approx 2 months to become a problem on our prod nodes as we don't fail many transactions, I wrote a repro script against one of our collections and ran it on a deliberately CPU constrained SolrCloud locally with three nodes, each with 4GB of heap available and approx 1GB of data to deliberately cause tasks to time out. This resulted in over 10,000 entries in the task list, none of which were active, after I stopped the script. Leaving the nodes running for a further 12 hours saw no reduction in the number of listed tasks. I've tried this both against the 9.4 we run in production and 9.8.0 just to see if it's improved, and although 9.8 is noticeably faster (nice work), the same thing happens. Any ideas? Thanks in advance, Reuben Reuben Thompson VP Product Innovation e: reuben.thomp...@acresoftware.com<mailto:reuben.thomp...@acresoftware.com> w: acresoftware.com [cid:822934a0-d587-4f8a-9b24-0fd01d9068a6]