[ https://issues.apache.org/jira/browse/KAFKA-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190794#comment-15190794 ]
Flavio Junqueira edited comment on KAFKA-3215 at 3/11/16 4:34 PM: ------------------------------------------------------------------ [~junrao] Let me see if I understand this issue correctly. bq. broker 1 is the controller and it has 2 consecutive ZK session expirations As I understand it, one possible run that reflects this is the following: # zkclient creates a session S1 # S1 session expires # zkclient queues the session expiration event to deliver to the kafka broker # zkclient creates a new session S2 # S2 expires # zkclient queues the session expiration for S2 and the event for S1 still hasn't been delivered # zkclient creates a third session S3 # broker 1 processes the session expiration of S1 # broker 1 successfully elects itself leader/controller in session S3 # broker 1 processes session expiration for S2 After this last step, the broker is messed up because the replica state machine isn't properly initialized. Also, the broker won't give up leadership because the ephemeral has been created in the current session. I think this was a problem in 0.8.2, but not a problem in 0.9 because we fixed it in KAFKA-1387. With ZKWatchedEphemeral, in the case we get that the znode exists while creating it, we check if the existing znode has the same session owner, in which case the operation returns ok and the controller becomes leader. Does it make sense? was (Author: fpj): [~junrao] Let me see if I understand this issue correctly. bq. broker 1 is the controller and it has 2 consecutive ZK session expirations As I understand this, one possible run that reflects this is the following: # zkclient creates a session S1 # S1 session expires # zkclient queues the session expiration event to deliver to the kafka broker # zkclient creates a new session S2 # S2 expires # zkclient queues the session expiration for S2 and the event for S1 still hasn't been delivered # zkclient creates a third session S3 # broker 1 processes the session expiration of S1 # broker 1 successfully elects itself leader/controller in session S3 # broker 1 processes session expiration for S2 After this last step, broker S2 is messed up because the replica state machine isn't properly initialized. Also, the broker won't give up leadership because the ephemeral has been created in the current session. I think this was a problem in 0.8.2, but not a problem in 0.9 because we fixed it in KAFKA-1387. With ZKWatchedEphemeral, in the case we get that the znode exists while creating it, we check if the existing znode has the same session owner, in which case the operation returns ok and the controller becomes leader. Does it make sense? > controller may not be started when there are multiple ZK session expirations > ---------------------------------------------------------------------------- > > Key: KAFKA-3215 > URL: https://issues.apache.org/jira/browse/KAFKA-3215 > Project: Kafka > Issue Type: Bug > Components: core > Reporter: Jun Rao > Assignee: Flavio Junqueira > Labels: controller > > Suppose that broker 1 is the controller and it has 2 consecutive ZK session > expirations. In this case, two ZK session expiration events will be fired. > 1. When handling the first ZK session expiration event, > SessionExpirationListener.handleNewSession() can elect broker 1 itself as the > new controller and initialize the states properly. > 2. When handling the second ZK session expiration event, > SessionExpirationListener.handleNewSession() first calls > onControllerResignation(), which will set ReplicaStateMachine.hasStarted to > false. It then continues to do controller election in > ZookeeperLeaderElector.elect() and try to create the controller node in ZK. > This will fail since broker 1 has already registered itself as the controller > node in ZK. In this case, we simply ignore the failure to create the > controller node since we assume the controller must be in another broker. > However, in this case, the controller is broker 1 itself, but the > ReplicaStateMachine.hasStarted is still false. > 3. Now, if a new broker event is fired, we will be ignoring the event in > BrokerChangeListener.handleChildChange since ReplicaStateMachine.hasStarted > is false. Now, we are in a situation that a controller is alive, but won't > react to any broker change event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)