[ 
https://issues.apache.org/jira/browse/KAFKA-4410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15666555#comment-15666555
 ] 

Onur Karaman commented on KAFKA-4410:
-------------------------------------

To reproduce the bug, spin up zookeeper and two kafka brokers:
{code}
> ./bin/zookeeper-server-start.sh config/zookeeper.properties
> export LOG_DIR=logs0 && ./bin/kafka-server-start.sh config/server0.properties
> export LOG_DIR=logs1 && ./bin/kafka-server-start.sh config/server1.properties
{code}
Create a topic with 100 partitions replication factor 2. This should make each 
broker have 50 leader replicas and 50 follower replicas:
{code}
> ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic t 
> --partition 100 --replication-factor 2
Created topic "t".
> ./bin/kafka-topics.sh --zookeeper localhost:2181 --describe | grep -o 
> "Leader: [0-9]" | sort | uniq -c
  50 Leader: 0
  50 Leader: 1
{code}
Control shutdown the broker (I chose the non-controller, broker 1). The request 
log indicates 99, almost exactly double the number of follower replicas on 
broker 1.
{code}
> grep "api_key=5" logs1/kafka-request.log | wc -l
      99
{code}
The one replica which was not doubled (partition 75), had its duplicate request 
fail to go out because broker 1 had already begun to disconnect from the 
controller.
{code}
> grep "api_key=5" logs1/kafka-request.log | egrep -o "partition=\d+" | sort | 
> uniq -c
   2 partition=1
   2 partition=11
   2 partition=13
   2 partition=15
   2 partition=17
   2 partition=19
   2 partition=21
   2 partition=23
   2 partition=25
   2 partition=27
   2 partition=29
   2 partition=3
   2 partition=31
   2 partition=33
   2 partition=35
   2 partition=37
   2 partition=39
   2 partition=41
   2 partition=43
   2 partition=45
   2 partition=47
   2 partition=49
   2 partition=5
   2 partition=51
   2 partition=53
   2 partition=55
   2 partition=57
   2 partition=59
   2 partition=61
   2 partition=63
   2 partition=65
   2 partition=67
   2 partition=69
   2 partition=7
   2 partition=71
   2 partition=73
   1 partition=75
   2 partition=77
   2 partition=79
   2 partition=81
   2 partition=83
   2 partition=85
   2 partition=87
   2 partition=89
   2 partition=9
   2 partition=91
   2 partition=93
   2 partition=95
   2 partition=97
   2 partition=99

> grep "fails to send request" logs0/controller.log
[2016-11-15 00:29:42,930] WARN [Controller-0-to-broker-1-send-thread], 
Controller 0 epoch 1 fails to send request 
{controller_id=0,controller_epoch=1,delete_partitions=false,partitions=[{topic=t,partition=75}]}
 to broker localhost:9091 (id: 1 rack: null). Reconnecting to broker. 
(kafka.controller.RequestSendThread)
{code}
Factoring in the failed StopReplicaRequest, this results in 99 + 1 = 100 
StopReplicaRequests, or 2x the expected number of StopReplicaRequests.

> KafkaController sends double the expected number of StopReplicaRequests 
> during controlled shutdown
> --------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-4410
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4410
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Onur Karaman
>            Assignee: Onur Karaman
>
> We expect KafkaController to send one StopReplicaRequest for each follower 
> replica on the broker undergoing controlled shutdown. Examining 
> KafkaController.shutdownBroker, we see that this is not the case:
> 1. KafkaController.shutdownBroker itself sends a StopReplicaRequest for each 
> follower replica
> 2. KafkaController.shutdownBroker transitions every follower replica to 
> OfflineReplica in its call to replicaStateMachine.handleStateChanges, which 
> also sends a StopReplicaRequest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to