[ https://issues.apache.org/jira/browse/KAFKA-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195593#comment-15195593 ]
Mayuresh Gharat commented on KAFKA-3390: ---------------------------------------- Do you mean that even after the topic got completely removed the replicaManager on the leader broker kept on trying to shrink the ISR. Am I understanding it correctly? -Mayuresh > ReplicaManager may infinitely try-fail to shrink ISR set of deleted partition > ----------------------------------------------------------------------------- > > Key: KAFKA-3390 > URL: https://issues.apache.org/jira/browse/KAFKA-3390 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.9.0.1 > Reporter: Stevo Slavic > Assignee: Mayuresh Gharat > > For a topic whose deletion has been requested, Kafka replica manager may end > up infinitely trying and failing to shrink ISR. > Here is fragment from server.log where this recurring and never ending > condition has been noticed: > {noformat} > [2016-03-04 09:42:13,894] INFO Partition [foo,0] on broker 1: Shrinking ISR > for partition [foo,0] from 1,3,2 to 1 (kafka.cluster.Partition) > [2016-03-04 09:42:13,897] WARN Conditional update of path > /brokers/topics/foo/partitions/0/state with data > {"controller_epoch":53,"leader":1,"version":1,"leader_epoch":34,"isr":[1]} > and expected version 68 failed due to > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for /brokers/topics/foo/partitions/0/state (kafka.utils.ZkUtils) > [2016-03-04 09:42:13,898] INFO Partition [foo,0] on broker 1: Cached > zkVersion [68] not equal to that in zookeeper, skip updating ISR > (kafka.cluster.Partition) > [2016-03-04 09:42:23,894] INFO Partition [foo,0] on broker 1: Shrinking ISR > for partition [foo,0] from 1,3,2 to 1 (kafka.cluster.Partition) > [2016-03-04 09:42:23,897] WARN Conditional update of path > /brokers/topics/foo/partitions/0/state with data > {"controller_epoch":53,"leader":1,"version":1,"leader_epoch":34,"isr":[1]} > and expected version 68 failed due to > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for /brokers/topics/foo/partitions/0/state (kafka.utils.ZkUtils) > [2016-03-04 09:42:23,897] INFO Partition [foo,0] on broker 1: Cached > zkVersion [68] not equal to that in zookeeper, skip updating ISR > (kafka.cluster.Partition) > [2016-03-04 09:42:33,894] INFO Partition [foo,0] on broker 1: Shrinking ISR > for partition [foo,0] from 1,3,2 to 1 (kafka.cluster.Partition) > [2016-03-04 09:42:33,897] WARN Conditional update of path > /brokers/topics/foo/partitions/0/state with data > {"controller_epoch":53,"leader":1,"version":1,"leader_epoch":34,"isr":[1]} > and expected version 68 failed due to > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for /brokers/topics/foo/partitions/0/state (kafka.utils.ZkUtils) > [2016-03-04 09:42:33,897] INFO Partition [foo,0] on broker 1: Cached > zkVersion [68] not equal to that in zookeeper, skip updating ISR > (kafka.cluster.Partition) > ... > {noformat} > Before topic deletion was requested, this was state in ZK of its sole > partition: > {noformat} > Zxid: 0x1800001045 > Cxid: 0xc92 > Client id: 0x3532dd88fd20000 > Time: Mon Feb 29 16:46:23 CET 2016 > Operation: setData > Path: /brokers/topics/foo/partitions/0/state > Data: > {"controller_epoch":53,"leader":1,"version":1,"leader_epoch":34,"isr":[1,3,2]} > Version: 68 > {noformat} > Topic (sole partition) had no data ever published to it. I guess at some > point after topic deletion has been requested, partition state first got > updated and this was updated state: > {noformat} > Zxid: 0x180000b0be > Cxid: 0x141e4 > Client id: 0x3532dd88fd20000 > Time: Fri Mar 04 9:41:52 CET 2016 > Operation: setData > Path: /brokers/topics/foo/partitions/0/state > Data: > {"controller_epoch":54,"leader":1,"version":1,"leader_epoch":35,"isr":[1,3]} > Version: 69 > {noformat} > For whatever reason replica manager (some cache it uses, I guess > ReplicaManager.allPartitions) never sees this update, nor does it see that > the partition state, partition, partitions node and finally topic node got > deleted: > {noformat} > Zxid: 0x180000b0bf > Cxid: 0x40fb > Client id: 0x3532dd88fd2000a > Time: Fri Mar 04 9:41:52 CET 2016 > Operation: delete > Path: /brokers/topics/foo/partitions/0/state > --- > Zxid: 0x180000b0c0 > Cxid: 0x40fe > Client id: 0x3532dd88fd2000a > Time: Fri Mar 04 9:41:52 CET 2016 > Operation: delete > Path: /brokers/topics/foo/partitions/0 > --- > Zxid: 0x180000b0c1 > Cxid: 0x4100 > Client id: 0x3532dd88fd2000a > Time: Fri Mar 04 9:41:52 CET 2016 > Operation: delete > Path: /brokers/topics/foo/partitions > --- > Zxid: 0x180000b0c2 > Cxid: 0x4102 > Client id: 0x3532dd88fd2000a > Time: Fri Mar 04 9:41:52 CET 2016 > Operation: delete > Path: /brokers/topics/foo > {noformat} > it just keeps on trying, every {{replica.lag.time.max.ms}}, to shrink ISR > even for partition/topic that has been deleted. > Broker 1 was controller in the cluster; notice that the same broker was lead > for the partition before it was deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)