[ https://issues.apache.org/jira/browse/KAFKA-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15191094#comment-15191094 ]
Jun Rao commented on KAFKA-3215: -------------------------------- [~fpj], thanks for the analysis. Yes, it does seems that this issue is fixed in 0.9.0. > controller may not be started when there are multiple ZK session expirations > ---------------------------------------------------------------------------- > > Key: KAFKA-3215 > URL: https://issues.apache.org/jira/browse/KAFKA-3215 > Project: Kafka > Issue Type: Bug > Components: core > Reporter: Jun Rao > Assignee: Flavio Junqueira > Labels: controller > > Suppose that broker 1 is the controller and it has 2 consecutive ZK session > expirations. In this case, two ZK session expiration events will be fired. > 1. When handling the first ZK session expiration event, > SessionExpirationListener.handleNewSession() can elect broker 1 itself as the > new controller and initialize the states properly. > 2. When handling the second ZK session expiration event, > SessionExpirationListener.handleNewSession() first calls > onControllerResignation(), which will set ReplicaStateMachine.hasStarted to > false. It then continues to do controller election in > ZookeeperLeaderElector.elect() and try to create the controller node in ZK. > This will fail since broker 1 has already registered itself as the controller > node in ZK. In this case, we simply ignore the failure to create the > controller node since we assume the controller must be in another broker. > However, in this case, the controller is broker 1 itself, but the > ReplicaStateMachine.hasStarted is still false. > 3. Now, if a new broker event is fired, we will be ignoring the event in > BrokerChangeListener.handleChildChange since ReplicaStateMachine.hasStarted > is false. Now, we are in a situation that a controller is alive, but won't > react to any broker change event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)