[ https://issues.apache.org/jira/browse/KAFKA-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dong Lin updated KAFKA-4443: ---------------------------- Description: Currently in onControllerFailover(), controller will startup replicaStatemachine and partitionStateMachine before invoking sendUpdateMetadataRequest(controllerContext.liveOrShuttingDownBrokerIds.toSeq). However, if a broker starts right after controller election, the LeaderAndIsrRequest sent to follower partitions on this broker will all be ignored because broker doesn't know the leaders are alive. To fix this problem, in onControllerFailover(), controller should send UpdateMetadataRequest to brokers after initializeControllerContext() but before it starts replicaStatemachine and partitionStateMachine. The first MetadatUpdateRequest will include list of live broker. Although it will not include partition leader information, it is OK because we will always send MetadataUpdateRequest again when we send LeaderAndIsrRequest during replicaStateMachine.startup() and partitionStateMachine.startup(). was: Currently in onControllerFailover(), controller will startup replicaStatemachine and partitionStateMachine before invoking sendUpdateMetadataRequest(controllerContext.liveOrShuttingDownBrokerIds.toSeq). However, if a broker right after controller election, the LeaderAndIsrRequest sent to follower partitions on this broker will all be ignored because broker doesn't know the leaders are alive. To fix this problem, in onControllerFailover(), controller should send UpdateMetadataRequest to brokers after initializeControllerContext() but before it starts replicaStatemachine and partitionStateMachine. The first MetadatUpdateRequest will include list of live broker. Although it will not include partition leader information, it is OK because we will always send MetadataUpdateRequest again when we send LeaderAndIsrRequest during replicaStateMachine.startup() and partitionStateMachine.startup(). > Controller should send UpdateMetadataRequest prior to LeaderAndIsrRequest > during failover > ----------------------------------------------------------------------------------------- > > Key: KAFKA-4443 > URL: https://issues.apache.org/jira/browse/KAFKA-4443 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.10.1.0 > Reporter: Dong Lin > Assignee: Dong Lin > Labels: reliability > Fix For: 0.10.1.1 > > > Currently in onControllerFailover(), controller will startup > replicaStatemachine and partitionStateMachine before invoking > sendUpdateMetadataRequest(controllerContext.liveOrShuttingDownBrokerIds.toSeq). > However, if a broker starts right after controller election, the > LeaderAndIsrRequest sent to follower partitions on this broker will all be > ignored because broker doesn't know the leaders are alive. > To fix this problem, in onControllerFailover(), controller should send > UpdateMetadataRequest to brokers after initializeControllerContext() but > before it starts replicaStatemachine and partitionStateMachine. The first > MetadatUpdateRequest will include list of live broker. Although it will not > include partition leader information, it is OK because we will always send > MetadataUpdateRequest again when we send LeaderAndIsrRequest during > replicaStateMachine.startup() and partitionStateMachine.startup(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)