[jira] [Commented] (KAFKA-4207) Partitions stopped after a rapid restart of a broker

Joel Koshy (JIRA) Mon, 26 Sep 2016 16:04:32 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15524443#comment-15524443
 ]


Joel Koshy commented on KAFKA-4207:
-----------------------------------

I have a KIP draft that has been sitting around for a while. I should be able 
to clean that up and send it out within the next week or so.

> Partitions stopped after a rapid restart of a broker
> ----------------------------------------------------
>
>                 Key: KAFKA-4207
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4207
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 0.9.0.1, 0.10.0.1
>            Reporter: Dustin Cote
>
> Environment:
> 4 Kafka brokers
> 10,000 topics with one partition each, replication factor 3
> Partitions with 4KB data each
> No data being produced or consumed
> Scenario:
> Initiate controlled shutdown on one broker
> Interrupt controlled shutdown prior completion with a SIGKILL
> Start a new broker with the same broker ID as broker that was just killed 
> immediately
> Symptoms:
> After starting the new broker, the other three brokers in the cluster will 
> see under replicated partitions forever for some partitions that are hosted 
> on the broker that was killed and restarted
> Cause:
> Today, the controller sends a StopReplica command for each replica hosted on 
> a broker that has initiated a controlled shutdown.  For a large number of 
> replicas this can take awhile.  When the broker that is doing the controlled 
> shutdown is killed, the StopReplica commands are queued up even though the 
> request queue to the broker is cleared.  When the broker comes back online, 
> the StopReplica commands that were queued, get sent to the broker that just 
> started up.  
> CC: [~junrao] since he's familiar with the scenario seen here



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-4207) Partitions stopped after a rapid restart of a broker

Reply via email to