[ https://issues.apache.org/jira/browse/KAFKA-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15524443#comment-15524443 ]
Joel Koshy commented on KAFKA-4207: ----------------------------------- I have a KIP draft that has been sitting around for a while. I should be able to clean that up and send it out within the next week or so. > Partitions stopped after a rapid restart of a broker > ---------------------------------------------------- > > Key: KAFKA-4207 > URL: https://issues.apache.org/jira/browse/KAFKA-4207 > Project: Kafka > Issue Type: Bug > Components: controller > Affects Versions: 0.9.0.1, 0.10.0.1 > Reporter: Dustin Cote > > Environment: > 4 Kafka brokers > 10,000 topics with one partition each, replication factor 3 > Partitions with 4KB data each > No data being produced or consumed > Scenario: > Initiate controlled shutdown on one broker > Interrupt controlled shutdown prior completion with a SIGKILL > Start a new broker with the same broker ID as broker that was just killed > immediately > Symptoms: > After starting the new broker, the other three brokers in the cluster will > see under replicated partitions forever for some partitions that are hosted > on the broker that was killed and restarted > Cause: > Today, the controller sends a StopReplica command for each replica hosted on > a broker that has initiated a controlled shutdown. For a large number of > replicas this can take awhile. When the broker that is doing the controlled > shutdown is killed, the StopReplica commands are queued up even though the > request queue to the broker is cleared. When the broker comes back online, > the StopReplica commands that were queued, get sent to the broker that just > started up. > CC: [~junrao] since he's familiar with the scenario seen here -- This message was sent by Atlassian JIRA (v6.3.4#6332)