[
https://issues.apache.org/jira/browse/SOLR-6530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139180#comment-14139180
]
Ramkumar Aiyengar commented on SOLR-6530:
-----------------------------------------
In general, its leader initiated recovery, so if I am not the leader, I
shouldn't be doing the logic for any operation. That's probably just commit for
now since that's not forwarded to the leader, but if there's any other
operation in the future which doesn't have to be coordinated by the leader,
that could use the same logic?
> Commits under network partition can put any node in down state by any node
> --------------------------------------------------------------------------
>
> Key: SOLR-6530
> URL: https://issues.apache.org/jira/browse/SOLR-6530
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Reporter: Shalin Shekhar Mangar
> Priority: Critical
> Fix For: 5.0, 6.0
>
> Attachments: SOLR-6530.patch, SOLR-6530.patch
>
>
> Commits are executed by any node in SolrCloud i.e. they're not routed via the
> leader like other updates.
> # Suppose there's 1 collection, 1 shard, 2 replicas (A and B) and A is the
> leader
> # Suppose a commit request is made to node B during a time where B cannot
> talk to A due to a partition for any reason (failing switch, heavy GC,
> whatever)
> # B fails to distribute the commit to A (times out) and asks A to recover
> # This was okay earlier because a leader just ignores recovery requests but
> with leader initiated recovery code, B puts A in the "down" state and A can
> never get out of that state.
> tl;dr; During network partitions, if enough commit/optimize requests are sent
> to the cluster, all the nodes in the cluster will eventually be marked as
> "down".
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]