This should have been fixed in the latest version. I have not tested it but experienced the same issue in the 9.1.1 version and the work around was to add as many warming queries <https://solr.apache.org/guide/solr/latest/configuration-guide/caches-warming.html#query-related-listeners> as possible to slow down the replica recovery process. This allows the core to warm up in the meantime and zk wouldn't send requests because the status of shard remains in recovery mode until the warming queries are replayed which helps reduce the 5xx.
Related jiras that are fixed and resolved as of 9.6.1 so this should have been fixed in 9.6.1 https://issues.apache.org/jira/browse/SOLR-17049 https://issues.apache.org/jira/browse/SOLR-17200 On Tue, Sep 3, 2024 at 12:47 AM Saksham Gupta <saksham.gu...@indiamart.com.invalid> wrote: > Hi there, > Looking for a graceful way to restart a node in solr9, please help! > > On Mon, Sep 2, 2024 at 5:21 PM Saksham Gupta <saksham.gu...@indiamart.com> > wrote: > > > Hi All, > > We have encountered an issue while upgrading our solr cloud from v8.10 to > > v9.6.1. We use a collection with 56 shards [each having a single > replica], > > hosted across a cluster of 8 nodes. Solr queries contain _route_ > parameter > > to decide which shards/ replicas will be used for the respective query. > > > > *Problem Description:* > > > > In solr8 if we stop/ restart a node for a maintenance activity, we did > not > > encounter any 5xx, but on solr9 we are facing multiple 5xx. > > On further debugging, we noticed that all the queries which are being > > routed to *different shards of a single node* [which is stopped], are > > returning 5xx which was not the case in solr 8. On replicating this on > > solr8, the queries are simply returning with partialResults=true flag > and 0 > > results. > > Important to note that if a solr 9 query is being routed to multiple > > shards out of which shard is present on some other node, it will be > > returned with partialResults=true and status=0, and 5xx is returned only > if > > query is routed to shards of a single node [which is stopped]. > > > > *1. Is someone aware of any changes done in solr9, to handle such cases?* > > *2. Is there a way to avoid 5xx in these cases on solr9 as well?* > > >