[ 
https://issues.apache.org/jira/browse/SOLR-15672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479705#comment-17479705
 ] 

Mark Robert Miller commented on SOLR-15672:
-------------------------------------------

That was likely an early motivation/thought while prototyping. It really ended 
up kind of stuck there due to its value in very simply allowing one to count on 
zk to ensure there is only one leader and every other potential leader wannabe 
or bug will be forced to reckon with through zks simplest recipe - the “dumb” 
but fantastic for smaller numbers, distributed lock. You make the node you get 
the lock, else you have to retry or knowingly cheat and delete.

You can of course go about that in other ways, but in the face of the surround 
landscape, that’s the kind of thing that let zk form some sort of back stop.

Would you want to put it in the cluster state? How complicated would that be to 
achieve the same role of being the “fail safe” potential leader election (only 
entered after thinking you’d win the standard zk election and then passed a 
shard sync check)? I dunno.

I looked at making changes to it myself. Personally, I would not have brought 
it into the state.json, I had separated collection structure from state 
(state.json not a great name for the structure, but …) and so I looked at 
different things. I really didn’t like that you had to read a full znode for 
such simple data.

I had pushed most consumption from zk to the zkstatereader. It had one server 
side, recursive watcher and just got the state change events streamed to it as 
they happened and distributed it out via publish/subscribe type callbacks. So 
node ame changes were ridiculously cheap, simple string watcher events 
streaming down a persistent connection. So as elections happened or leaders 
registered, the zkstatereader is just watching it go by. Except the actual 
leader for this case is in the zk node data, not the znodes name. So I was 
trying to get the byte array out of their and just have the leader indicated by 
the znode name. It ended up being pretty tricky though and I pulled back, 
because even with a hugely more stable and vetted leader election processes, I 
could find difficult to test and address corner cases where the current 
“distributed lock” properties stood in the way of a data loss mistake. A full 
or partial cluster restart during a lot of activity where an overseer changed 
was the scariest and toughest I’d hit, but there was certainly scope for more 
as long as the proper scale and scenarios matched the right code to spot and 
verify the right things.

So yeah, redundant info, things I don’t like about it, I can’t any direction 
would be a good or bad one, depends on a whole lot either way. I did end up 
keeping redundancy myself - I kept the leader and replica stats in the 
“structure” json file vs force humans and every consumer to reconstruct the 
full view themselves or go through the right intermediary. But I only notified 
to take a look on a structure changes, not every time it was updated, and it 
wasn’t updated based on client activity, the overseer chose. For code needing 
the live stream, the zkstatereader reader supplied that to them by keeping its 
structure of the last structure change it was told to get, plus the steam of 
state changes it saw from its recursive watcher (except it did still take a 
peak at leader nodes explicitly to read the byte array contains the winners 
name :(). 

> Leader Election is flawed. 
> ---------------------------
>
>                 Key: SOLR-15672
>                 URL: https://issues.apache.org/jira/browse/SOLR-15672
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Mark Robert Miller
>            Priority: Major
>
> Filing this not as a work item I’m assigning my to myself, but to note a open 
> issue where some notes can accumulate. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to