Hi Ryan, can you help re-reproduce the issue on virtual machines? If so, I added two more brokers (so five in total now) in a vagrant file https://github.com/stealthly/kafka/tree/0.8_hubspot_testing_1
git clone https://github.com/stealthly/kafka/tree/0.8_hubspot_testing_1 cd 0.8_hubspot_testing_1 vagrant up you need vagrant http://www.vagrantup.com/downloads.html and virtual box installed https://www.virtualbox.org/ I tried to reproduce and not sure what steps to take or is there issue when it launches? Joes-MacBook-Air:kafka joestein$ bin/kafka-create-topic.sh --zookeeper 192.168.50.5:2181 --replica 2 --partition 15 --topic hubspot_testing creation succeeded! Joes-MacBook-Air:kafka joestein$ bin/kafka-list-topic.sh --zookeeper 192.168.50.5:2181 topic: hubspot_testing partition: 0 leader: 3 replicas: 3,1 isr: 3,1 topic: hubspot_testing partition: 1 leader: 4 replicas: 4,2 isr: 4,2 topic: hubspot_testing partition: 2 leader: 1 replicas: 1,3 isr: 1,3 topic: hubspot_testing partition: 3 leader: 2 replicas: 2,4 isr: 2,4 topic: hubspot_testing partition: 4 leader: 3 replicas: 3,2 isr: 3,2 topic: hubspot_testing partition: 5 leader: 4 replicas: 4,3 isr: 4,3 topic: hubspot_testing partition: 6 leader: 1 replicas: 1,4 isr: 1,4 topic: hubspot_testing partition: 7 leader: 2 replicas: 2,1 isr: 2,1 topic: hubspot_testing partition: 8 leader: 3 replicas: 3,4 isr: 3,4 topic: hubspot_testing partition: 9 leader: 4 replicas: 4,1 isr: 4,1 topic: hubspot_testing partition: 10 leader: 1 replicas: 1,2 isr: 1,2 topic: hubspot_testing partition: 11 leader: 2 replicas: 2,3 isr: 2,3 topic: hubspot_testing partition: 12 leader: 3 replicas: 3,1 isr: 3,1 topic: hubspot_testing partition: 13 leader: 4 replicas: 4,2 isr: 4,2 topic: hubspot_testing partition: 14 leader: 1 replicas: 1,3 isr: 1,3 Are you using the Oracle JDK? Do you have one topic for the 15 partitions? /******************************************* Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop> ********************************************/ On Tue, Dec 17, 2013 at 7:09 PM, Ryan Berdeen <rberd...@hubspot.com> wrote: > Sorry it's taken so long to reply, the issue went away after I reassigned > partitions. Now it's back. > > I haven't checked JMX, because the brokers and zookeeper have been > reporting the same ISR for several hours. > > Some more details: > > The cluster/topic has > 5 brokers (1, 4, 5, 7, 8) > 15 partitions (0...14) > 2 replicas > > A single broker, 4, is the one missing from the ISR in every case. For > partitions where 4 is the leader (1, 6, 11), it is present in the ISR. For > partitions where 4 is not the leader (4, 8, 12), it is not present in the > ISR. Here's the output of my tool, showing assignment and ISR: > https://gist.github.com/also/8012383#file-from-brokers-txt > > I haven't seen anything interesting in the logs, but I'm not entirely sure > what to look for. The cluster is currently in this state, and if it goes > like last time, this will persist until I reassign partitions. > > What can I do in the meantime to track down the issue? > > Thanks, > > Ryan > > On Thu, Dec 5, 2013 at 12:55 AM, Jun Rao <jun...@gmail.com> wrote: > > > Do you see any ISR churns on the brokers? You can check the ISR > > expand/shrink rate jmx. > > > > Thanks, > > > > Jun > > > > > > On Wed, Dec 4, 2013 at 3:53 PM, Ryan Berdeen <rberd...@hubspot.com> > wrote: > > > > > I'm working on some monitoring tools for Kafka, and I've seen a couple > of > > > clusters get into a state where ClientUtils.fetchTopicMetadata will > show > > > that not all replicas are in the ISR. > > > > > > At the same time, ZkUtils.getLeaderIsrAndEpochForPartition will show > that > > > all all partitions are in the ISR, and > > > the > "kafka.server":name="UnderReplicatedPartitions",type="ReplicaManager" > > > MBean will report 0. > > > > > > What's going on? Is there something wrong with my controller, or > should I > > > not be paying attention to ClientUtils.fetchTopicMetadata? > > > > > > Thanks, > > > > > > Ryan > > > > > >