Sorry it's taken so long to reply, the issue went away after I reassigned partitions. Now it's back.
I haven't checked JMX, because the brokers and zookeeper have been reporting the same ISR for several hours. Some more details: The cluster/topic has 5 brokers (1, 4, 5, 7, 8) 15 partitions (0...14) 2 replicas A single broker, 4, is the one missing from the ISR in every case. For partitions where 4 is the leader (1, 6, 11), it is present in the ISR. For partitions where 4 is not the leader (4, 8, 12), it is not present in the ISR. Here's the output of my tool, showing assignment and ISR: https://gist.github.com/also/8012383#file-from-brokers-txt I haven't seen anything interesting in the logs, but I'm not entirely sure what to look for. The cluster is currently in this state, and if it goes like last time, this will persist until I reassign partitions. What can I do in the meantime to track down the issue? Thanks, Ryan On Thu, Dec 5, 2013 at 12:55 AM, Jun Rao <jun...@gmail.com> wrote: > Do you see any ISR churns on the brokers? You can check the ISR > expand/shrink rate jmx. > > Thanks, > > Jun > > > On Wed, Dec 4, 2013 at 3:53 PM, Ryan Berdeen <rberd...@hubspot.com> wrote: > > > I'm working on some monitoring tools for Kafka, and I've seen a couple of > > clusters get into a state where ClientUtils.fetchTopicMetadata will show > > that not all replicas are in the ISR. > > > > At the same time, ZkUtils.getLeaderIsrAndEpochForPartition will show that > > all all partitions are in the ISR, and > > the "kafka.server":name="UnderReplicatedPartitions",type="ReplicaManager" > > MBean will report 0. > > > > What's going on? Is there something wrong with my controller, or should I > > not be paying attention to ClientUtils.fetchTopicMetadata? > > > > Thanks, > > > > Ryan > > >