Joe Stein created KAFKA-1825: -------------------------------- Summary: leadership election state is stale and never recovers without all brokers restarting Key: KAFKA-1825 URL: Project: Kafka Issue Type: Bug Affects Versions:, 0.8.2 Reporter: Joe Stein Priority: Critical Fix For: 0.8.2
I am not sure what is the cause here but I can succinctly and repeatedly reproduce this issue. I tried with and 0.8.2-beta and both behave in the same manner. The code to reproduce this is here scenario 3 brokers, 1 zookeeper, 1 client (each AWS c3.2xlarge instances) create topic producer client sends in 380,000 messages/sec (attached executable) everything is fine until you kill -SIGTERM broker #2 then at that point the state goes bad for that topic. even trying to use the console producer (with the sarama producer off) doesn't work. doing a describe the yoyoma topic looks fine, ran prefered leadership election lots of issues... still can't produce... only resolution is bouncing all brokers :( root@ip-10-233-52-139:/opt/kafka_2.10- bin/ --zookeeper --describe Topic:yoyoma PartitionCount:36 ReplicationFactor:3 Configs: Topic: yoyoma Partition: 0 Leader: 1 Replicas: 1,2,3 Isr: 1,3 Topic: yoyoma Partition: 1 Leader: 1 Replicas: 2,3,1 Isr: 1,3 Topic: yoyoma Partition: 2 Leader: 1 Replicas: 3,1,2 Isr: 1,3 Topic: yoyoma Partition: 3 Leader: 1 Replicas: 1,3,2 Isr: 1,3 Topic: yoyoma Partition: 4 Leader: 1 Replicas: 2,1,3 Isr: 1,3 Topic: yoyoma Partition: 5 Leader: 1 Replicas: 3,2,1 Isr: 1,3 Topic: yoyoma Partition: 6 Leader: 1 Replicas: 1,2,3 Isr: 1,3 Topic: yoyoma Partition: 7 Leader: 1 Replicas: 2,3,1 Isr: 1,3 Topic: yoyoma Partition: 8 Leader: 1 Replicas: 3,1,2 Isr: 1,3 Topic: yoyoma Partition: 9 Leader: 1 Replicas: 1,3,2 Isr: 1,3 Topic: yoyoma Partition: 10 Leader: 1 Replicas: 2,1,3 Isr: 1,3 Topic: yoyoma Partition: 11 Leader: 1 Replicas: 3,2,1 Isr: 1,3 Topic: yoyoma Partition: 12 Leader: 1 Replicas: 1,2,3 Isr: 1,3 Topic: yoyoma Partition: 13 Leader: 1 Replicas: 2,3,1 Isr: 1,3 Topic: yoyoma Partition: 14 Leader: 1 Replicas: 3,1,2 Isr: 1,3 Topic: yoyoma Partition: 15 Leader: 1 Replicas: 1,3,2 Isr: 1,3 Topic: yoyoma Partition: 16 Leader: 1 Replicas: 2,1,3 Isr: 1,3 Topic: yoyoma Partition: 17 Leader: 1 Replicas: 3,2,1 Isr: 1,3 Topic: yoyoma Partition: 18 Leader: 1 Replicas: 1,2,3 Isr: 1,3 Topic: yoyoma Partition: 19 Leader: 1 Replicas: 2,3,1 Isr: 1,3 Topic: yoyoma Partition: 20 Leader: 1 Replicas: 3,1,2 Isr: 1,3 Topic: yoyoma Partition: 21 Leader: 1 Replicas: 1,3,2 Isr: 1,3 Topic: yoyoma Partition: 22 Leader: 1 Replicas: 2,1,3 Isr: 1,3 Topic: yoyoma Partition: 23 Leader: 1 Replicas: 3,2,1 Isr: 1,3 Topic: yoyoma Partition: 24 Leader: 1 Replicas: 1,2,3 Isr: 1,3 Topic: yoyoma Partition: 25 Leader: 1 Replicas: 2,3,1 Isr: 1,3 Topic: yoyoma Partition: 26 Leader: 1 Replicas: 3,1,2 Isr: 1,3 Topic: yoyoma Partition: 27 Leader: 1 Replicas: 1,3,2 Isr: 1,3 Topic: yoyoma Partition: 28 Leader: 1 Replicas: 2,1,3 Isr: 1,3 Topic: yoyoma Partition: 29 Leader: 1 Replicas: 3,2,1 Isr: 1,3 Topic: yoyoma Partition: 30 Leader: 1 Replicas: 1,2,3 Isr: 1,3 Topic: yoyoma Partition: 31 Leader: 1 Replicas: 2,3,1 Isr: 1,3 Topic: yoyoma Partition: 32 Leader: 1 Replicas: 3,1,2 Isr: 1,3 Topic: yoyoma Partition: 33 Leader: 1 Replicas: 1,3,2 Isr: 1,3 Topic: yoyoma Partition: 34 Leader: 1 Replicas: 2,1,3 Isr: 1,3 Topic: yoyoma Partition: 35 Leader: 1 Replicas: 3,2,1 Isr: 1,3 root@ip-10-233-52-139:/opt/kafka_2.10- bin/ --zookeeper Successfully started preferred replica election for partitions Set([yoyoma,29], [yoyoma,14], [yoyoma,22], [yoyoma,15], [yoyoma,3], [yoyoma,11], [yoyoma,32], [yoyoma,23], [yoyoma,18], [yoyoma,25], [yoyoma,26], [yoyoma,1], [yoyoma,9], [yoyoma,33], [yoyoma,5], [yoyoma,12], [yoyoma,20], [yoyoma,4], [yoyoma,7], [yoyoma,24], [yoyoma,35], [yoyoma,10], [yoyoma,8], [yoyoma,2], [yoyoma,21], [yoyoma,31], [yoyoma,28], [yoyoma,19], [yoyoma,16], [yoyoma,13], [yoyoma,34], [yoyoma,0], [test-1210,0], [yoyoma,30], [yoyoma,27], [yoyoma,17], [yoyoma,6]) [2014-12-19 18:33:56,228] INFO [ReplicaFetcherManager on broker 1] Removed fetcher for partitions [yoyoma,29],[yoyoma,14],[yoyoma,11],[yoyoma,32],[yoyoma,23],[yoyoma,26],[yoyoma,5],[yoyoma,20],[yoyoma,35],[yoyoma,8],[yoyoma,2],[yoyoma,17] (kafka.server.ReplicaFetcherManager) [2014-12-19 18:33:56,229] INFO Truncating log yoyoma-29 to offset 6481451. (kafka.log.Log) [2014-12-19 18:33:56,229] INFO Truncating log yoyoma-14 to offset 6469671. (kafka.log.Log) [2014-12-19 18:33:56,229] INFO Truncating log yoyoma-11 to offset 6472578. (kafka.log.Log) [2014-12-19 18:33:56,229] INFO Truncating log yoyoma-32 to offset 6481923. (kafka.log.Log) [2014-12-19 18:33:56,230] INFO Truncating log yoyoma-23 to offset 6473039. (kafka.log.Log) [2014-12-19 18:33:56,230] INFO Truncating log yoyoma-26 to offset 6478089. (kafka.log.Log) [2014-12-19 18:33:56,230] INFO Truncating log yoyoma-5 to offset 6473159. (kafka.log.Log) [2014-12-19 18:33:56,230] INFO Truncating log yoyoma-20 to offset 6474790. (kafka.log.Log) [2014-12-19 18:33:56,230] INFO Truncating log yoyoma-35 to offset 6482661. (kafka.log.Log) [2014-12-19 18:33:56,230] INFO Truncating log yoyoma-8 to offset 6467814. (kafka.log.Log) [2014-12-19 18:33:56,231] INFO Truncating log yoyoma-2 to offset 6477942. (kafka.log.Log) [2014-12-19 18:33:56,231] INFO Truncating log yoyoma-17 to offset 6476136. (kafka.log.Log) [2014-12-19 18:33:56,241] INFO [ReplicaFetcherThread-2-3], Starting (kafka.server.ReplicaFetcherThread) [2014-12-19 18:33:56,243] INFO [ReplicaFetcherThread-1-3], Starting (kafka.server.ReplicaFetcherThread) [2014-12-19 18:33:56,244] INFO [ReplicaFetcherThread-3-3], Starting (kafka.server.ReplicaFetcherThread) [2014-12-19 18:33:56,245] INFO [ReplicaFetcherThread-0-3], Starting (kafka.server.ReplicaFetcherThread) [2014-12-19 18:33:56,245] INFO [ReplicaFetcherManager on broker 1] Added fetcher for partitions ArrayBuffer([[yoyoma,23], initOffset 6473039 to broker id:3,host:,port:9092] , [[yoyoma,17], initOffset 6476136 to broker id:3,host:,port:9092] , [[yoyoma,32], initOffset 6481923 to broker id:3,host:,port:9092] , [[yoyoma,14], initOffset 6469671 to broker id:3,host:,port:9092] , [[yoyoma,20], initOffset 6474790 to broker id:3,host:,port:9092] , [[yoyoma,8], initOffset 6467814 to broker id:3,host:,port:9092] , [[yoyoma,5], initOffset 6473159 to broker id:3,host:,port:9092] , [[yoyoma,35], initOffset 6482661 to broker id:3,host:,port:9092] , [[yoyoma,2], initOffset 6477942 to broker id:3,host:,port:9092] , [[yoyoma,11], initOffset 6472578 to broker id:3,host:,port:9092] , [[yoyoma,26], initOffset 6478089 to broker id:3,host:,port:9092] , [[yoyoma,29], initOffset 6481451 to broker id:3,host:,port:9092] ) (kafka.server.ReplicaFetcherManager) [2014-12-19 18:33:56,289] WARN [KafkaApi-1] Fetch request with correlation id 1845 from client ReplicaFetcherThread-1-1 on partition [yoyoma,29] failed due to Leader not local for partition [yoyoma,29] on broker 1 (kafka.server.KafkaApis) [2014-12-19 18:33:56,290] WARN [KafkaApi-1] Fetch request with correlation id 1845 from client ReplicaFetcherThread-1-1 on partition [yoyoma,5] failed due to Leader not local for partition [yoyoma,5] on broker 1 (kafka.server.KafkaApis) [2014-12-19 18:33:56,290] WARN [KafkaApi-1] Fetch request with correlation id 1845 from client ReplicaFetcherThread-1-1 on partition [yoyoma,17] failed due to Leader not local for partition [yoyoma,17] on broker 1 (kafka.server.KafkaApis) [2014-12-19 18:33:56,290] WARN [KafkaApi-1] Fetch request with correlation id 1845 from client ReplicaFetcherThread-3-1 on partition [yoyoma,11] failed due to Leader not local for partition [yoyoma,11] on broker 1 (kafka.server.KafkaApis) [2014-12-19 18:33:56,290] WARN [KafkaApi-1] Fetch request with correlation id 1845 from client ReplicaFetcherThread-3-1 on partition [yoyoma,23] failed due to Leader not local for partition [yoyoma,23] on broker 1 (kafka.server.KafkaApis) [2014-12-19 18:33:56,290] WARN [KafkaApi-1] Fetch request with correlation id 1845 from client ReplicaFetcherThread-3-1 on partition [yoyoma,35] failed due to Leader not local for partition [yoyoma,35] on broker 1 (kafka.server.KafkaApis) [2014-12-19 18:33:56,290] WARN [KafkaApi-1] Fetch request with correlation id 1845 from client ReplicaFetcherThread-2-1 on partition [yoyoma,14] failed due to Leader not local for partition [yoyoma,14] on broker 1 (kafka.server.KafkaApis) [2014-12-19 18:33:56,290] WARN [KafkaApi-1] Fetch request with correlation id 1845 from client ReplicaFetcherThread-2-1 on partition [yoyoma,26] failed due to Leader not local for partition [yoyoma,26] on broker 1 (kafka.server.KafkaApis) [2014-12-19 18:33:56,291] WARN [KafkaApi-1] Fetch request with correlation id 1845 from client ReplicaFetcherThread-2-1 on partition [yoyoma,2] failed due to Leader not local for partition [yoyoma,2] on broker 1 (kafka.server.KafkaApis) [2014-12-19 18:33:56,334] WARN [KafkaApi-1] Fetch request with correlation id 1845 from client ReplicaFetcherThread-0-1 on partition [yoyoma,32] failed due to Leader not local for partition [yoyoma,32] on broker 1 (kafka.server.KafkaApis) [2014-12-19 18:33:56,334] WARN [KafkaApi-1] Fetch request with correlation id 1845 from client ReplicaFetcherThread-0-1 on partition [yoyoma,20] failed due to Leader not local for partition [yoyoma,20] on broker 1 (kafka.server.KafkaApis) [2014-12-19 18:33:56,334] WARN [KafkaApi-1] Fetch request with correlation id 1845 from client ReplicaFetcherThread-0-1 on partition [yoyoma,8] failed due to Leader not local for partition [yoyoma,8] on broker 1 (kafka.server.KafkaApis) root@ip-10-233-52-139:/opt/kafka_2.10- bin/ --zookeeper --describe Topic:yoyoma PartitionCount:36 ReplicationFactor:3 Configs: Topic: yoyoma Partition: 0 Leader: 1 Replicas: 1,2,3 Isr: 1,3 Topic: yoyoma Partition: 1 Leader: 1 Replicas: 2,3,1 Isr: 1,3 Topic: yoyoma Partition: 2 Leader: 3 Replicas: 3,1,2 Isr: 1,3 Topic: yoyoma Partition: 3 Leader: 1 Replicas: 1,3,2 Isr: 1,3 Topic: yoyoma Partition: 4 Leader: 1 Replicas: 2,1,3 Isr: 1,3 Topic: yoyoma Partition: 5 Leader: 3 Replicas: 3,2,1 Isr: 1,3 Topic: yoyoma Partition: 6 Leader: 1 Replicas: 1,2,3 Isr: 1,3 Topic: yoyoma Partition: 7 Leader: 1 Replicas: 2,3,1 Isr: 1,3 Topic: yoyoma Partition: 8 Leader: 3 Replicas: 3,1,2 Isr: 1,3 Topic: yoyoma Partition: 9 Leader: 1 Replicas: 1,3,2 Isr: 1,3 Topic: yoyoma Partition: 10 Leader: 1 Replicas: 2,1,3 Isr: 1,3 Topic: yoyoma Partition: 11 Leader: 3 Replicas: 3,2,1 Isr: 1,3 Topic: yoyoma Partition: 12 Leader: 1 Replicas: 1,2,3 Isr: 1,3 Topic: yoyoma Partition: 13 Leader: 1 Replicas: 2,3,1 Isr: 1,3 Topic: yoyoma Partition: 14 Leader: 3 Replicas: 3,1,2 Isr: 1,3 Topic: yoyoma Partition: 15 Leader: 1 Replicas: 1,3,2 Isr: 1,3 Topic: yoyoma Partition: 16 Leader: 1 Replicas: 2,1,3 Isr: 1,3 Topic: yoyoma Partition: 17 Leader: 3 Replicas: 3,2,1 Isr: 1,3 Topic: yoyoma Partition: 18 Leader: 1 Replicas: 1,2,3 Isr: 1,3 Topic: yoyoma Partition: 19 Leader: 1 Replicas: 2,3,1 Isr: 1,3 Topic: yoyoma Partition: 20 Leader: 3 Replicas: 3,1,2 Isr: 1,3 Topic: yoyoma Partition: 21 Leader: 1 Replicas: 1,3,2 Isr: 1,3 Topic: yoyoma Partition: 22 Leader: 1 Replicas: 2,1,3 Isr: 1,3 Topic: yoyoma Partition: 23 Leader: 3 Replicas: 3,2,1 Isr: 1,3 Topic: yoyoma Partition: 24 Leader: 1 Replicas: 1,2,3 Isr: 1,3 Topic: yoyoma Partition: 25 Leader: 1 Replicas: 2,3,1 Isr: 1,3 Topic: yoyoma Partition: 26 Leader: 3 Replicas: 3,1,2 Isr: 1,3 Topic: yoyoma Partition: 27 Leader: 1 Replicas: 1,3,2 Isr: 1,3 Topic: yoyoma Partition: 28 Leader: 1 Replicas: 2,1,3 Isr: 1,3 Topic: yoyoma Partition: 29 Leader: 3 Replicas: 3,2,1 Isr: 1,3 Topic: yoyoma Partition: 30 Leader: 1 Replicas: 1,2,3 Isr: 1,3 Topic: yoyoma Partition: 31 Leader: 1 Replicas: 2,3,1 Isr: 1,3 Topic: yoyoma Partition: 32 Leader: 3 Replicas: 3,1,2 Isr: 1,3 Topic: yoyoma Partition: 33 Leader: 1 Replicas: 1,3,2 Isr: 1,3 Topic: yoyoma Partition: 34 Leader: 1 Replicas: 2,1,3 Isr: 1,3 Topic: yoyoma Partition: 35 Leader: 3 Replicas: 3,2,1 Isr: 1,3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)