Re: ISR not updating

Shone Sadler Mon, 19 May 2014 01:30:51 -0700

The value of under replicated partitions is 0 across the cluster.

Thanks,
Shone



On Mon, May 19, 2014 at 12:23 AM, Jun Rao <[email protected]> wrote:

> What's the value of under replicated partitions JMX in each broker?
>
> Thanks,
>
> Jun
>
>
> On Sat, May 17, 2014 at 6:16 PM, Paul Mackles <[email protected]> wrote:
>
> > Today we did a rolling restart of ZK. We also restarted the kafka
> > controller and ISRs are still not being updated in ZK. Again, the cluster
> > seems fine and the replicas in question do appear to be getting updated.
> I
> > am guessing there must be some bad state persisted in ZK.
> >
> > On 5/17/14 7:50 PM, "Shone Sadler" <[email protected]> wrote:
> >
> > >Hi Jun,
> > >
> > >I work with Paul and am monitoring the cluster as well.   The status has
> > >not changed.
> > >
> > >When we execute kafka-list-topic we are seeing the following (showing
> one
> > >of two partitions having the problem)
> > >
> > >topic: t1 partition: 33 leader: 1 replicas: 1,2,3 isr: 1
> > >
> > >when inspecting the logs of leader: I do see a spurt of ISR
> > >shrinkage/expansion  around the time that the brokers were partitioned
> > >from
> > >ZK. But nothing past the last message "Cached zkVersion [17] not equal
> to
> > >that in zookeeper." from  yesterday.  There are not constant changes to
> > >the
> > >ISR list.
> > >
> > >Is there a way to force the leader to update ZK with the latest ISR
> list?
> > >
> > >Thanks,
> > >Shone
> > >
> > >Logs:
> > >
> > >cat server.log | grep "\[t1,33\]"
> > >
> > >[2014-04-18 10:16:32,814] INFO [ReplicaFetcherManager on broker 1]
> > >Removing
> > >fetcher for partition [t1,33] (kafka.server.ReplicaFetcherManager)
> > >[2014-05-13 19:42:10,784] ERROR [KafkaApi-1] Error when processing fetch
> > >request for partition [t1,33] offset 330118156 from consumer with
> > >correlation id 0 (kafka.server.KafkaApis)
> > >[2014-05-14 11:02:25,255] ERROR [KafkaApi-1] Error when processing fetch
> > >request for partition [t1,33] offset 332896470 from consumer with
> > >correlation id 0 (kafka.server.KafkaApis)
> > >[2014-05-16 12:00:11,344] INFO Partition [t1,33] on broker 1: Shrinking
> > >ISR
> > >for partition [t1,33] from 3,1,2 to 1 (kafka.cluster.Partition)
> > >[2014-05-16 12:00:18,009] INFO Partition [t1,33] on broker 1: Cached
> > >zkVersion [17] not equal to that in zookeeper, skip updating ISR
> > >(kafka.cluster.Partition)
> > >[2014-05-16 13:33:11,344] INFO Partition [t1,33] on broker 1: Shrinking
> > >ISR
> > >for partition [t1,33] from 3,1,2 to 1 (kafka.cluster.Partition)
> > >[2014-05-16 13:33:12,403] INFO Partition [t1,33] on broker 1: Cached
> > >zkVersion [17] not equal to that in zookeeper, skip updating ISR
> > >(kafka.cluster.Partition)
> > >
> > >
> > >On Sat, May 17, 2014 at 11:44 AM, Jun Rao <[email protected]> wrote:
> > >
> > >> Do you see constant ISR shrinking/expansion of those two partitions in
> > >>the
> > >> leader broker's log ?
> > >>
> > >> Thanks,
> > >>
> > >> Jun
> > >>
> > >>
> > >> On Fri, May 16, 2014 at 4:25 PM, Paul Mackles <[email protected]>
> > >>wrote:
> > >>
> > >> > Hi - We are running kafka_2.8.0-0.8.0-beta1 (we are a little behind
> in
> > >> > upgrading).
> > >> >
> > >> > From what I can tell, connectivity to ZK was lost for a brief
> period.
> > >>The
> > >> > cluster seemed to recover OK except that we now have 2 (out of 125)
> > >> > partitions where the ISR appears to be out of date. In other words,
> > >> > kafka-list-topic is showing only one replica in the ISR for the 2
> > >> > partitions in question (there should be 3).
> > >> >
> > >> > What's odd is that in looking at the log segments for those
> > >>partitions on
> > >> > the file system, I can see that they are in fact getting updated and
> > >>by
> > >> all
> > >> > measures look to be in sync. I can also see that the brokers where
> the
> > >> > out-of-sync replicas reside are doing fine and leading other
> > >>partitions
> > >> > like nothing ever happened. Based on that, it seems like the ISR in
> > >>ZK is
> > >> > just out-of-date due to a botched recovery from the brief ZK outage.
> > >> >
> > >> > Has anyone seen anything like this before? I saw this ticket which
> > >> sounded
> > >> > similar:
> > >> >
> > >> > https://issues.apache.org/jira/browse/KAFKA-948
> > >> >
> > >> > Anyone have any suggestions for recovering from this state? I was
> > >> thinking
> > >> > of running the preferred-replica-election tool next to see if that
> > >>gets
> > >> the
> > >> > ISRs in ZK back in sync.
> > >> >
> > >> > After that, I guess the next step would be to bounce the kafka
> > >>servers in
> > >> > question.
> > >> >
> > >> > Thanks,
> > >> > Paul
> > >> >
> > >> >
> > >>
> >
> >
>

Re: ISR not updating

Reply via email to