On 12/5/22 03:07, Nick Vladiceanu wrote:
The problem we face is when we try to reload the collection, in sync mode we’re 
getting timed out or forever running task if reload executed in async mode:

Are you by chance using deleteByQuery? If you are, there is a possibility that you're running into a problem that dBQ can cause.

Basically, that kind of delete does not mix well with Lucene's built-in segment merging. If a segment merge happens at the same time as a dBQ, then all subsequent index updates are put on hold until the merge finishes. Sometimes a merge can take a REALLY long time, especially if it is explicitly kicked off as a Solr "optimize" operation. Large segment merges happen without an optimize too, so this can happen even if you never use optimize.

It is entirely possible that if this happens in Solr 9.x (which uses Lucene 9.x), that Lucene prevents the shutdown of the current searcher until the merge finishes, but in 8.x, shutting down the searcher and killing the merge is allowed by Lucene.

If this is what is happening, and I do not know enough about Lucene internals to say for sure whether it could be the problem, changing a delete by query into a query (to gather ID values) and then a delete by ID will almost certainly fix it.

I ran into issues with combining dBQ with optimize way back in Solr 3.x or 4.x, with Solr indexes at a job that I no longer have. And once I discovered the issue, I replaced every usage of deleteByQuery in the indexing software, and never had a problem again. I had never tried an index reload while that was happening, though.

Thanks,
Shawn

Reply via email to