Hi Jun,

I work with Paul and am monitoring the cluster as well.   The status has
not changed.

When we execute kafka-list-topic we are seeing the following (showing one
of two partitions having the problem)

topic: t1 partition: 33 leader: 1 replicas: 1,2,3 isr: 1

when inspecting the logs of leader: I do see a spurt of ISR
shrinkage/expansion  around the time that the brokers were partitioned from
ZK. But nothing past the last message "Cached zkVersion [17] not equal to
that in zookeeper." from  yesterday.  There are not constant changes to the
ISR list.

Is there a way to force the leader to update ZK with the latest ISR list?

Thanks,
Shone

Logs:

cat server.log | grep "\[t1,33\]"

[2014-04-18 10:16:32,814] INFO [ReplicaFetcherManager on broker 1] Removing
fetcher for partition [t1,33] (kafka.server.ReplicaFetcherManager)
[2014-05-13 19:42:10,784] ERROR [KafkaApi-1] Error when processing fetch
request for partition [t1,33] offset 330118156 from consumer with
correlation id 0 (kafka.server.KafkaApis)
[2014-05-14 11:02:25,255] ERROR [KafkaApi-1] Error when processing fetch
request for partition [t1,33] offset 332896470 from consumer with
correlation id 0 (kafka.server.KafkaApis)
[2014-05-16 12:00:11,344] INFO Partition [t1,33] on broker 1: Shrinking ISR
for partition [t1,33] from 3,1,2 to 1 (kafka.cluster.Partition)
[2014-05-16 12:00:18,009] INFO Partition [t1,33] on broker 1: Cached
zkVersion [17] not equal to that in zookeeper, skip updating ISR
(kafka.cluster.Partition)
[2014-05-16 13:33:11,344] INFO Partition [t1,33] on broker 1: Shrinking ISR
for partition [t1,33] from 3,1,2 to 1 (kafka.cluster.Partition)
[2014-05-16 13:33:12,403] INFO Partition [t1,33] on broker 1: Cached
zkVersion [17] not equal to that in zookeeper, skip updating ISR
(kafka.cluster.Partition)


On Sat, May 17, 2014 at 11:44 AM, Jun Rao <jun...@gmail.com> wrote:

> Do you see constant ISR shrinking/expansion of those two partitions in the
> leader broker's log ?
>
> Thanks,
>
> Jun
>
>
> On Fri, May 16, 2014 at 4:25 PM, Paul Mackles <pmack...@adobe.com> wrote:
>
> > Hi - We are running kafka_2.8.0-0.8.0-beta1 (we are a little behind in
> > upgrading).
> >
> > From what I can tell, connectivity to ZK was lost for a brief period. The
> > cluster seemed to recover OK except that we now have 2 (out of 125)
> > partitions where the ISR appears to be out of date. In other words,
> > kafka-list-topic is showing only one replica in the ISR for the 2
> > partitions in question (there should be 3).
> >
> > What's odd is that in looking at the log segments for those partitions on
> > the file system, I can see that they are in fact getting updated and by
> all
> > measures look to be in sync. I can also see that the brokers where the
> > out-of-sync replicas reside are doing fine and leading other partitions
> > like nothing ever happened. Based on that, it seems like the ISR in ZK is
> > just out-of-date due to a botched recovery from the brief ZK outage.
> >
> > Has anyone seen anything like this before? I saw this ticket which
> sounded
> > similar:
> >
> > https://issues.apache.org/jira/browse/KAFKA-948
> >
> > Anyone have any suggestions for recovering from this state? I was
> thinking
> > of running the preferred-replica-election tool next to see if that gets
> the
> > ISRs in ZK back in sync.
> >
> > After that, I guess the next step would be to bounce the kafka servers in
> > question.
> >
> > Thanks,
> > Paul
> >
> >
>

Reply via email to