[ 
https://issues.apache.org/jira/browse/SOLR-6691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218097#comment-14218097
 ] 

Erick Erickson commented on SOLR-6691:
--------------------------------------

[~noble.paul] [~markrmiller]

OK, I'm working this out (slowly). Here's the deal though. I don't see a 
graceful way of telling a node that is _currently_ a leader to stop being 
leader. Oh, and it must re-insert itself at the end of the leader-elector 
queue. I don't really want to down the node, that seems far too harsh, but 
perhaps it's not.

Also, what I'm trying at this point (I'll improve if necessary before 
committing the patch).

For leader rebalancing, basically just delete the leader ephemeral election 
node. I'd like the leader node itself to do this but I don't yet see a clean 
way to inform the current leader it should abdicate that role. I can do this 
from anywhere, but it seems cleaner if the core itself does it.

1> each node is watching the one before it. So when the leader ephemeral node 
disappears, the next node gets the event and looks through the queue to see if 
some _other_ node is preferred leader. If so, it puts itself at the end of the 
leader election queue and does _not_ become leader. But it does remove it's own 
ephemeral node so the next node in the chain gets that event and so on.

1a> I'm having trouble having the leader that's abdicating get the message that 
it should abdicate its role. I'm trying to have the leader watch its own 
ephemeral node, is there a better way?

Note that the only place this really produces churn is when a 
BALANCESHARDUNIQUE is issued and then immediately a REBALANCELEADERS is issued. 
Otherwise, when cores are loaded, if they are the preferred leader they insert 
themselves at the head of the leader-elector queue so REBALANCELEADERS in that 
case shouldn't cause any unnecessary churn.

As you can tell, I'm a bit stymied, I'll plug along but wondered if there's 
some prior art I haven't found yet.

Thanks!

> REBALANCELEADERS needs to change the leader election queue.
> -----------------------------------------------------------
>
>                 Key: SOLR-6691
>                 URL: https://issues.apache.org/jira/browse/SOLR-6691
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>
> The original code (SOLR-6517) assumed that changes in the clusterstate after 
> issuing a command to the overseer to change the leader indicated that the 
> leader was successfully changed. Fortunately, Noble clued me in that this 
> isn't the case and that the potential leader needs to insert itself in the 
> leader election queue before trigging the change leader command.
> Inserting themselves in the front of the queue should probably happen in 
> BALANCESHARDUNIQUE when the preferredLeader property is assigned as well.
> [~noble.paul] Do evil things happen if a node joins at the head but it's 
> _already_ in the queue? These ephemeral nodes in the queue are watching each 
> other. So if node1 is the leader you have
> node1 <- node2 <- node3 <- node4
> where <- means "watches".
> Now, if node3 puts itself at the head of the list, you have
> {code}
> node1 <- node2
>       <- node3 <- node4
> {code}
> I _think_ when I was looking at this it all "just worked". 
> 1> node 1 goes down. Nodes 2 and 3 duke it out but there's code to insure 
> that node3 becomes the leader and node2 inserts itself at then end so it's 
> watching node 4.
> 2> node 2 goes down, nobody gets notified and it doesn't matter.
> 3> node 3 goes down, node 4 gets notified and starts watching node 2 by 
> inserting itself at the end of the list.
> 4> node 4 goes down, nobody gets notified and it doesn't matter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to