[ https://issues.apache.org/jira/browse/SOLR-15672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479705#comment-17479705 ]
Mark Robert Miller commented on SOLR-15672: ------------------------------------------- That was likely an early motivation/thought while prototyping. It really ended up kind of stuck there due to its value in very simply allowing one to count on zk to ensure there is only one leader and every other potential leader wannabe or bug will be forced to reckon with through zks simplest recipe - the “dumb” but fantastic for smaller numbers, distributed lock. You make the node you get the lock, else you have to retry or knowingly cheat and delete. You can of course go about that in other ways, but in the face of the surround landscape, that’s the kind of thing that let zk form some sort of back stop. Would you want to put it in the cluster state? How complicated would that be to achieve the same role of being the “fail safe” potential leader election (only entered after thinking you’d win the standard zk election and then passed a shard sync check)? I dunno. I looked at making changes to it myself. Personally, I would not have brought it into the state.json, I had separated collection structure from state (state.json not a great name for the structure, but …) and so I looked at different things. I really didn’t like that you had to read a full znode for such simple data. I had pushed most consumption from zk to the zkstatereader. It had one server side, recursive watcher and just got the state change events streamed to it as they happened and distributed it out via publish/subscribe type callbacks. So node ame changes were ridiculously cheap, simple string watcher events streaming down a persistent connection. So as elections happened or leaders registered, the zkstatereader is just watching it go by. Except the actual leader for this case is in the zk node data, not the znodes name. So I was trying to get the byte array out of their and just have the leader indicated by the znode name. It ended up being pretty tricky though and I pulled back, because even with a hugely more stable and vetted leader election processes, I could find difficult to test and address corner cases where the current “distributed lock” properties stood in the way of a data loss mistake. A full or partial cluster restart during a lot of activity where an overseer changed was the scariest and toughest I’d hit, but there was certainly scope for more as long as the proper scale and scenarios matched the right code to spot and verify the right things. So yeah, redundant info, things I don’t like about it, I can’t any direction would be a good or bad one, depends on a whole lot either way. I did end up keeping redundancy myself - I kept the leader and replica stats in the “structure” json file vs force humans and every consumer to reconstruct the full view themselves or go through the right intermediary. But I only notified to take a look on a structure changes, not every time it was updated, and it wasn’t updated based on client activity, the overseer chose. For code needing the live stream, the zkstatereader reader supplied that to them by keeping its structure of the last structure change it was told to get, plus the steam of state changes it saw from its recursive watcher (except it did still take a peak at leader nodes explicitly to read the byte array contains the winners name :(). > Leader Election is flawed. > --------------------------- > > Key: SOLR-15672 > URL: https://issues.apache.org/jira/browse/SOLR-15672 > Project: Solr > Issue Type: Bug > Reporter: Mark Robert Miller > Priority: Major > > Filing this not as a work item I’m assigning my to myself, but to note a open > issue where some notes can accumulate. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org