Dustin Cote created KAFKA-4207:
----------------------------------
Summary: Partitions stopped after a rapid restart of a broker
Key: KAFKA-4207
URL: https://issues.apache.org/jira/browse/KAFKA-4207
Project: Kafka
Issue Type: Bug
Components: controller
Affects Versions: 0.10.0.1, 0.9.0.1
Reporter: Dustin Cote
Environment:
4 Kafka brokers
10,000 topics with one partition each, replication factor 3
Partitions with 4KB data each
No data being produced or consumed
Scenario:
Initiate controlled shutdown on one broker
Interrupt controlled shutdown prior completion with a SIGKILL
Start a new broker with the same broker ID as broker that was just killed
immediately
Symptoms:
After starting the new broker, the other three brokers in the cluster will see
under replicated partitions forever for some partitions that are hosted on the
broker that was killed and restarted
Cause:
Today, the controller sends a StopReplica command for each replica hosted on a
broker that has initiated a controlled shutdown. For a large number of
replicas this can take awhile. When the broker that is doing the controlled
shutdown is killed, the StopReplica commands are queued up even though the
request queue to the broker is cleared. When the broker comes back online, the
StopReplica commands that were queued, get sent to the broker that just started
up.
CC: [~junrao] since he's familiar with the scenario seen here
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)