[ https://issues.apache.org/jira/browse/KAFKA-4410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15666555#comment-15666555 ]
Onur Karaman commented on KAFKA-4410: ------------------------------------- To reproduce the bug, spin up zookeeper and two kafka brokers: {code} > ./bin/zookeeper-server-start.sh config/zookeeper.properties > export LOG_DIR=logs0 && ./bin/kafka-server-start.sh config/server0.properties > export LOG_DIR=logs1 && ./bin/kafka-server-start.sh config/server1.properties {code} Create a topic with 100 partitions replication factor 2. This should make each broker have 50 leader replicas and 50 follower replicas: {code} > ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic t > --partition 100 --replication-factor 2 Created topic "t". > ./bin/kafka-topics.sh --zookeeper localhost:2181 --describe | grep -o > "Leader: [0-9]" | sort | uniq -c 50 Leader: 0 50 Leader: 1 {code} Control shutdown the broker (I chose the non-controller, broker 1). The request log indicates 99, almost exactly double the number of follower replicas on broker 1. {code} > grep "api_key=5" logs1/kafka-request.log | wc -l 99 {code} The one replica which was not doubled (partition 75), had its duplicate request fail to go out because broker 1 had already begun to disconnect from the controller. {code} > grep "api_key=5" logs1/kafka-request.log | egrep -o "partition=\d+" | sort | > uniq -c 2 partition=1 2 partition=11 2 partition=13 2 partition=15 2 partition=17 2 partition=19 2 partition=21 2 partition=23 2 partition=25 2 partition=27 2 partition=29 2 partition=3 2 partition=31 2 partition=33 2 partition=35 2 partition=37 2 partition=39 2 partition=41 2 partition=43 2 partition=45 2 partition=47 2 partition=49 2 partition=5 2 partition=51 2 partition=53 2 partition=55 2 partition=57 2 partition=59 2 partition=61 2 partition=63 2 partition=65 2 partition=67 2 partition=69 2 partition=7 2 partition=71 2 partition=73 1 partition=75 2 partition=77 2 partition=79 2 partition=81 2 partition=83 2 partition=85 2 partition=87 2 partition=89 2 partition=9 2 partition=91 2 partition=93 2 partition=95 2 partition=97 2 partition=99 > grep "fails to send request" logs0/controller.log [2016-11-15 00:29:42,930] WARN [Controller-0-to-broker-1-send-thread], Controller 0 epoch 1 fails to send request {controller_id=0,controller_epoch=1,delete_partitions=false,partitions=[{topic=t,partition=75}]} to broker localhost:9091 (id: 1 rack: null). Reconnecting to broker. (kafka.controller.RequestSendThread) {code} Factoring in the failed StopReplicaRequest, this results in 99 + 1 = 100 StopReplicaRequests, or 2x the expected number of StopReplicaRequests. > KafkaController sends double the expected number of StopReplicaRequests > during controlled shutdown > -------------------------------------------------------------------------------------------------- > > Key: KAFKA-4410 > URL: https://issues.apache.org/jira/browse/KAFKA-4410 > Project: Kafka > Issue Type: Bug > Reporter: Onur Karaman > Assignee: Onur Karaman > > We expect KafkaController to send one StopReplicaRequest for each follower > replica on the broker undergoing controlled shutdown. Examining > KafkaController.shutdownBroker, we see that this is not the case: > 1. KafkaController.shutdownBroker itself sends a StopReplicaRequest for each > follower replica > 2. KafkaController.shutdownBroker transitions every follower replica to > OfflineReplica in its call to replicaStateMachine.handleStateChanges, which > also sends a StopReplicaRequest. -- This message was sent by Atlassian JIRA (v6.3.4#6332)