If a broker never joins an ISR, it could be that the fetcher died
unexpectedly. Did you see any "Error due to " in the log of broker 4?

Another thing to check is the max lag and the per partition lag in jmx.

Thanks,

Jun


On Tue, Dec 17, 2013 at 4:09 PM, Ryan Berdeen <rberd...@hubspot.com> wrote:

> Sorry it's taken so long to reply, the issue went away after I reassigned
> partitions. Now it's back.
>
> I haven't checked JMX, because the brokers and zookeeper have been
> reporting the same ISR for several hours.
>
> Some more details:
>
> The cluster/topic has
>   5 brokers (1, 4, 5, 7, 8)
>   15 partitions (0...14)
>   2 replicas
>
> A single broker, 4, is the one missing from the ISR in every case. For
> partitions where 4 is the leader (1, 6, 11), it is present in the ISR. For
> partitions where 4 is not the leader (4, 8, 12), it is not present in the
> ISR. Here's the output of my tool, showing assignment and ISR:
> https://gist.github.com/also/8012383#file-from-brokers-txt
>
> I haven't seen anything interesting in the logs, but I'm not entirely sure
> what to look for. The cluster is currently in this state, and if it goes
> like last time, this will persist until I reassign partitions.
>
> What can I do in the meantime to track down the issue?
>
> Thanks,
>
> Ryan
>
> On Thu, Dec 5, 2013 at 12:55 AM, Jun Rao <jun...@gmail.com> wrote:
>
> > Do you see any ISR churns on the brokers? You can check the ISR
> > expand/shrink rate jmx.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Wed, Dec 4, 2013 at 3:53 PM, Ryan Berdeen <rberd...@hubspot.com>
> wrote:
> >
> > > I'm working on some monitoring tools for Kafka, and I've seen a couple
> of
> > > clusters get into a state where ClientUtils.fetchTopicMetadata will
> show
> > > that not all replicas are in the ISR.
> > >
> > > At the same time, ZkUtils.getLeaderIsrAndEpochForPartition will show
> that
> > > all all partitions are in the ISR, and
> > > the
> "kafka.server":name="UnderReplicatedPartitions",type="ReplicaManager"
> > > MBean will report 0.
> > >
> > > What's going on? Is there something wrong with my controller, or
> should I
> > > not be paying attention to ClientUtils.fetchTopicMetadata?
> > >
> > > Thanks,
> > >
> > > Ryan
> > >
> >
>

Reply via email to