Hello, Does anyone know how I can debug high ISR churn on the kafka leader on a cluster without traffic? I have 2 topics on a 4 node cluster (replica 4 and replica 3) and both show constant changes of the number of insync replicas:
[2017-03-22 15:30:10,945] INFO Partition [__consumer_offsets,0] on broker 2: Expanding ISR for partition __consumer_offsets-0 from 2,4 to 2,4,5 (kafka.cluster.Partition) [2017-03-22 15:31:41,193] INFO Partition [__consumer_offsets,0] on broker 2: Shrinking ISR for partition [__consumer_offsets,0] from 2,4,5 to 2,5 (kafka.cluster.Partition) [2017-03-22 15:31:41,195] INFO Partition [__consumer_offsets,0] on broker 2: Expanding ISR for partition __consumer_offsets-0 from 2,5 to 2,5,4 (kafka.cluster.Partition) [2017-03-22 15:35:03,443] INFO Partition [__consumer_offsets,0] on broker 2: Shrinking ISR for partition [__consumer_offsets,0] from 2,5,4 to 2,5 (kafka.cluster.Partition) [2017-03-22 15:35:03,445] INFO Partition [__consumer_offsets,0] on broker 2: Expanding ISR for partition __consumer_offsets-0 from 2,5 to 2,5,4 (kafka.cluster.Partition) [2017-03-22 15:37:01,443] INFO Partition [__consumer_offsets,0] on broker 2: Shrinking ISR for partition [__consumer_offsets,0] from 2,5,4 to 2,4 (kafka.cluster.Partition) [2017-03-22 15:37:01,445] INFO Partition [__consumer_offsets,0] on broker 2: Expanding ISR for partition __consumer_offsets-0 from 2,4 to 2,4,5 (kafka.cluster.Partition) and [2017-03-22 15:09:52,646] INFO Partition [topic1,0] on broker 5: Shrinking ISR for partition [topic1,0] from 5,2,4 to 5,4 (kafka.cluster.Partition) [2017-03-22 15:09:52,648] INFO Partition [topic1,0] on broker 5: Expanding ISR for partition topic1-0 from 5,4 to 5,4,2 (kafka.cluster.Partition) [2017-03-22 15:24:05,646] INFO Partition [topic1,0] on broker 5: Shrinking ISR for partition [topic1,0] from 5,4,2 to 5,4 (kafka.cluster.Partition) [2017-03-22 15:24:05,648] INFO Partition [topic1,0] on broker 5: Expanding ISR for partition topic1-0 from 5,4 to 5,4,2 (kafka.cluster.Partition) [2017-03-22 15:26:49,599] INFO Partition [topic1,0] on broker 5: Expanding ISR for partition topic1-0 from 5,4,2 to 5,4,2,1 (kafka.cluster.Partition) [2017-03-22 15:27:46,396] INFO Partition [topic1,0] on broker 5: Shrinking ISR for partition [topic1,0] from 5,4,2,1 to 5,4,1 (kafka.cluster.Partition) [2017-03-22 15:27:46,398] INFO Partition [topic1,0] on broker 5: Expanding ISR for partition topic1-0 from 5,4,1 to 5,4,1,2 (kafka.cluster.Partition) [2017-03-22 15:45:47,896] INFO Partition [topic1,0] on broker 5: Shrinking ISR for partition [topic1,0] from 5,4,1,2 to 5,1,2 (kafka.cluster.Partition) [2017-03-22 15:45:47,898] INFO Partition [topic1,0] on broker 5: Expanding ISR for partition topic1-0 from 5,1,2 to 5,1,2,4 (kafka.cluster.Partition) (END) I have tried increasing the num.network.threads (now 8) and num.replica.fetchers (now 2) but nothing has changed. The kafka server config is: default.replication.factor=4 log.retention.check.interval.ms=300000 log.retention.hours=168 log.roll.hours=24 log.segment.bytes=104857600 min.insync.replicas=2 num.io.threads=8 num.network.threads=15 num.partitions=1 num.recovery.threads.per.data.dir=1 num.replica.fetchers=2 offsets.topic.num.partitions=1 offsets.topic.replication.factor=3 replica.lag.time.max.ms=500 socket.receive.buffer.bytes=102400 socket.request.max.bytes=104857600 socket.send.buffer.bytes=102400 unclean.leader.election.enable=false zookeeper.connection.timeout.ms=3000 Best regards, Radu