[ 
https://issues.apache.org/jira/browse/KAFKA-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15133716#comment-15133716
 ] 

Elias Levy edited comment on KAFKA-2729 at 2/5/16 5:55 AM:
-----------------------------------------------------------

Had the same issue happen here while testing a 5 node Kafka cluster with a 3 
node ZK ensemble on Kubernetes on AWS.  After running for a while broker 2 
started showing the "Cached zkVersion [29] not equal to that in zookeeper, skip 
updating ISR" error message for al the partitions it leads.  For those 
partition it is the only in sync replica.  That has led to the Samza jobs I was 
running to stop.

I should note that I am running 0.9.0.0.


was (Author: elevy):
Had the same issue happen here while testing a 5 node Kafka cluster with a 3 
node ZK ensemble on Kubernetes on AWS.  After running for a while broker 2 
started showing the "Cached zkVersion [29] not equal to that in zookeeper, skip 
updating ISR" error message for al the partitions it leads.  For those 
partition it is the only in sync replica.  That has led to the Samza jobs I was 
running to stop.

> Cached zkVersion not equal to that in zookeeper, broker not recovering.
> -----------------------------------------------------------------------
>
>                 Key: KAFKA-2729
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2729
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.2.1
>            Reporter: Danil Serdyuchenko
>
> After a small network wobble where zookeeper nodes couldn't reach each other, 
> we started seeing a large number of undereplicated partitions. The zookeeper 
> cluster recovered, however we continued to see a large number of 
> undereplicated partitions. Two brokers in the kafka cluster were showing this 
> in the logs:
> {code}
> [2015-10-27 11:36:00,888] INFO Partition 
> [__samza_checkpoint_event-creation_1,3] on broker 5: Shrinking ISR for 
> partition [__samza_checkpoint_event-creation_1,3] from 6,5 to 5 
> (kafka.cluster.Partition)
> [2015-10-27 11:36:00,891] INFO Partition 
> [__samza_checkpoint_event-creation_1,3] on broker 5: Cached zkVersion [66] 
> not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
> {code}
> For all of the topics on the effected brokers. Both brokers only recovered 
> after a restart. Our own investigation yielded nothing, I was hoping you 
> could shed some light on this issue. Possibly if it's related to: 
> https://issues.apache.org/jira/browse/KAFKA-1382 , however we're using 
> 0.8.2.1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to