If a broker never joins an ISR, it could be that the fetcher died unexpectedly. Did you see any "Error due to " in the log of broker 4?
Another thing to check is the max lag and the per partition lag in jmx. Thanks, Jun On Tue, Dec 17, 2013 at 4:09 PM, Ryan Berdeen <rberd...@hubspot.com> wrote: > Sorry it's taken so long to reply, the issue went away after I reassigned > partitions. Now it's back. > > I haven't checked JMX, because the brokers and zookeeper have been > reporting the same ISR for several hours. > > Some more details: > > The cluster/topic has > 5 brokers (1, 4, 5, 7, 8) > 15 partitions (0...14) > 2 replicas > > A single broker, 4, is the one missing from the ISR in every case. For > partitions where 4 is the leader (1, 6, 11), it is present in the ISR. For > partitions where 4 is not the leader (4, 8, 12), it is not present in the > ISR. Here's the output of my tool, showing assignment and ISR: > https://gist.github.com/also/8012383#file-from-brokers-txt > > I haven't seen anything interesting in the logs, but I'm not entirely sure > what to look for. The cluster is currently in this state, and if it goes > like last time, this will persist until I reassign partitions. > > What can I do in the meantime to track down the issue? > > Thanks, > > Ryan > > On Thu, Dec 5, 2013 at 12:55 AM, Jun Rao <jun...@gmail.com> wrote: > > > Do you see any ISR churns on the brokers? You can check the ISR > > expand/shrink rate jmx. > > > > Thanks, > > > > Jun > > > > > > On Wed, Dec 4, 2013 at 3:53 PM, Ryan Berdeen <rberd...@hubspot.com> > wrote: > > > > > I'm working on some monitoring tools for Kafka, and I've seen a couple > of > > > clusters get into a state where ClientUtils.fetchTopicMetadata will > show > > > that not all replicas are in the ISR. > > > > > > At the same time, ZkUtils.getLeaderIsrAndEpochForPartition will show > that > > > all all partitions are in the ISR, and > > > the > "kafka.server":name="UnderReplicatedPartitions",type="ReplicaManager" > > > MBean will report 0. > > > > > > What's going on? Is there something wrong with my controller, or > should I > > > not be paying attention to ClientUtils.fetchTopicMetadata? > > > > > > Thanks, > > > > > > Ryan > > > > > >