Chris M. Hostetter created SOLR-17656:
-----------------------------------------

             Summary: Add expert level option to allowe PULL replicas to go 
ACTIVE w/o RECOVERING
                 Key: SOLR-17656
                 URL: https://issues.apache.org/jira/browse/SOLR-17656
             Project: Solr
          Issue Type: New Feature
            Reporter: Chris M. Hostetter
            Assignee: Chris M. Hostetter


In situations where a Solr cluster undergoes a rolling restart (or some other 
"catastrophic" failure situations requiring/causing solr node restarts) there 
can be a snowball effect of poor performance (or even solr node crashing) due 
to fewer then normal replicas serving query requests while replicas on 
restarting nodes are DOWN or RECOVERING – especially if shard leaders are also 
affected, and (restarting) replicas first must wait for a leader election 
before they can recover (or wait to finish recovery from an over-worked leader).

For NRT type usecases, RECOVERING is really a necessary evil to ensure every 
replicas is up to date before handling NRT requests – but in the case of PULL 
replicas, which are expected to routinely "lag" behind their leader, I've 
talked to a lot of Solr users w/usecases where they would be happy to have PULL 
replicas back online serving "stale" data ASAP, and let normal IndexFetching 
"catchup" with the leader later.

I propose we support a new "advanced" replica property that can be set on PULL 
replicas by expert level users, to indicate: on (re)init, these replicas may 
skip RECOVERING and go directly to ACTIVE.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to