[ https://issues.apache.org/jira/browse/KAFKA-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565914#comment-14565914 ]
Joe Stein commented on KAFKA-1778: ---------------------------------- Hey, sorry for late reply. I have seen now on a few dozen clusters situations where the broker gets into a state where the controller is hung and the only recourse is to either delete the znode from Zookeeper (/controller) to force a re-election or shutdown the broker. In the former case I have seen in one situation where the entire cluster went down. I am fairly certain this was because of the version of Zookeeper they were running (3.4.5) however I haven't ever tried to reproduce it. The latter case many folks don't want to shutdown the broker because they are in high traffic situations and doing so we could be a lot worse than the controller not working... sometimes that changes and they shut the broker down so the controller can fail over and their partition reassignment can continue to the new brokers they just launched (as an example). So, originally we were thinking of fixing this be having an admin call that could trigger safely another leader election. We have been finding though that just having the broker start without it ever being able to be the controller (can.be.controller = false) is preferable in *a lot* of cases. This way there are brokers that will never be the controller and then some that could and with the brokers that could one of them would. ~ Joestein > Create new re-elect controller admin function > --------------------------------------------- > > Key: KAFKA-1778 > URL: https://issues.apache.org/jira/browse/KAFKA-1778 > Project: Kafka > Issue Type: Sub-task > Reporter: Joe Stein > Assignee: Abhishek Nigam > Fix For: 0.8.3 > > > kafka --controller --elect -- This message was sent by Atlassian JIRA (v6.3.4#6332)