Sounds like you are bumping into this https://issues.apache.org/jira/browse/KAFKA-1367
/******************************************* Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop> ********************************************/ On Wed, Jan 21, 2015 at 10:10 AM, svante karlsson <s...@csi.se> wrote: > We are running an external (like in non supported) C++ client library > agains 0.8.2-rc2 and see differences in the Isr vector in Metadata Response > compared to what ./kafka-topics.sh --describe returns. > > We have a triple replicated topic that is not updated during the test. > > kafka-topics.sh > returns > > Topic: saka.test.int_datastream Partition: 0 Leader: 3 > Replicas: 3,1,2 Isr: 2,1,3 > Topic: saka.test.int_datastream Partition: 1 Leader: 1 > Replicas: 1,2,3 Isr: 2,1,3 > > > After some debugging of the received packet it seems the data is actually > missing from the server. > > After a sequensial restart of each broker - everything was back to normal > > two pairs of loglines every 10s > > initial state: > > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2, > 1, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2, > 1, 3, > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2, > 1, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2, > 1, 3, > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2, > 1, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2, > 1, 3, > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2, > 1, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2, > 1, 3, > > restart broker 1 > > handle_connect_retry_timer > _connect_async_next z8r102-mc12-4-4.sth-tc2.videoplaza.net:9092 > > saka.test.int_datastream Partition: 1 Leader: 2 Replicas: 1, 2, 3, Isr: 2, > 3, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2, > 3, > saka.test.int_datastream Partition: 1 Leader: 2 Replicas: 1, 2, 3, Isr: 2, > 3, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2, > 3, > ... > saka.test.int_datastream Partition: 1 Leader: 2 Replicas: 1, 2, 3, Isr: 2, > 3, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2, > 3, > saka.test.int_datastream Partition: 1 Leader: 2 Replicas: 1, 2, 3, Isr: 2, > 3, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2, > 3, > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2, > 3, 1, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2, > 3, > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2, > 3, 1, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2, > 3, > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2, > 3, 1, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2, > 3, > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2, > 3, 1, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2, > 3, > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2, > 3, 1, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2, > 3, > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2, > 3, 1, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2, > 3, > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2, > 3, 1, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2, > 3, > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2, > 3, 1, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2, > 3, > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2, > 3, 1, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2, > 3, > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2, > 3, 1, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2, > 3, > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2, > 3, 1, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2, > 3, > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2, > 3, 1, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2, > 3, > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2, > 3, 1, > > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2, > 3, > > restart broker 3 > > known brokers changed {.... } > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, Isr: 2, 1, > saka.test.int_datastream Partition: 0 Leader: 2 Replicas: 1, 2, Isr: 2, 1, > known brokers changed { .... } > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2, > 1, > saka.test.int_datastream Partition: 0 Leader: 2 Replicas: 3, 1, 2, Isr: 2, > 1, > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2, > 1, > saka.test.int_datastream Partition: 0 Leader: 2 Replicas: 3, 1, 2, Isr: 2, > 1, > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2, > 1, > saka.test.int_datastream Partition: 0 Leader: 2 Replicas: 3, 1, 2, Isr: 2, > 1, > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2, > 1, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2, > 1, 3, > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2, > 1, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2, > 1, 3, > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2, > 1, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2, > 1, 3, > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2, > 1, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2, > 1, 3, > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2, > 1, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2, > 1, 3, > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2, > 1, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2, > 1, 3, > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2, > 1, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2, > 1, 3, > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2, > 1, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2, > 1, 3, > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2, > 1, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2, > 1, 3, > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2, > 1, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2, > 1, 3, > > restart broker 2 > > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 1, > 3, 2, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 1, > 3, 2, > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 1, > 3, 2, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 1, > 3, 2, > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 1, > 3, 2, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 1, > 3, 2, > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 1, > 3, 2, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 1, > 3, 2, > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 1, > 3, 2, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 1, > 3, 2, > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 1, > 3, 2, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 1, > 3, 2, > saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 1, > 3, 2, > saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 1, > 3, 2, > > > all this time kafka-topics.sh returns (except for a very short time during > the restart) > > Topic: saka.test.int_datastream Partition: 0 Leader: 3 Replicas: > 3,1,2 Isr: 2,1,3 > Topic: saka.test.int_datastream Partition: 1 Leader: 1 Replicas: > 1,2,3 Isr: 2,1,3 > > > This seems reproducible by shutting down all brokers at the same time. Then > the isr vectors will never "heal". Bumping broker by broker heals them > again. > > /svante > > > /svante > > /svante >