Thank you Shawn for sharing, indeed useful information. However, I must say that we only used deleteById and never deleteByQuery. We also only rely on the auto segment merging and not issuing optimize command.
Thanks, --- Nick Vladiceanu vladicean...@gmail.com > On 20. Dec 2022, at 11:52, Shawn Heisey <apa...@elyograg.org> wrote: > > On 12/5/22 03:07, Nick Vladiceanu wrote: >> The problem we face is when we try to reload the collection, in sync mode >> we’re getting timed out or forever running task if reload executed in async >> mode: > > Are you by chance using deleteByQuery? If you are, there is a possibility > that you're running into a problem that dBQ can cause. > > Basically, that kind of delete does not mix well with Lucene's built-in > segment merging. If a segment merge happens at the same time as a dBQ, then > all subsequent index updates are put on hold until the merge finishes. > Sometimes a merge can take a REALLY long time, especially if it is explicitly > kicked off as a Solr "optimize" operation. Large segment merges happen > without an optimize too, so this can happen even if you never use optimize. > > It is entirely possible that if this happens in Solr 9.x (which uses Lucene > 9.x), that Lucene prevents the shutdown of the current searcher until the > merge finishes, but in 8.x, shutting down the searcher and killing the merge > is allowed by Lucene. > > If this is what is happening, and I do not know enough about Lucene internals > to say for sure whether it could be the problem, changing a delete by query > into a query (to gather ID values) and then a delete by ID will almost > certainly fix it. > > I ran into issues with combining dBQ with optimize way back in Solr 3.x or > 4.x, with Solr indexes at a job that I no longer have. And once I discovered > the issue, I replaced every usage of deleteByQuery in the indexing software, > and never had a problem again. I had never tried an index reload while that > was happening, though. > > Thanks, > Shawn