Hi Zach, Any issues with partitions broker 2 is leader of?
Also, have you checked b2's server.log? Cheers, Liam Clarke-Hutchinson On Wed, 1 Apr. 2020, 11:02 am Zach Cox, <zcox...@gmail.com> wrote: > Hi - We have a small Kafka 2.0.0 (Zookeeper 3.4.13) cluster with 3 brokers: > 0, 1, and 2. Each broker is in a separate rack (Azure zone). > > Recently there was an incident, where Kafka brokers and Zookeeper nodes > restarted, etc. After that occurred, we've had problems where broker 2 is > consistently out of many ISRs. A pattern we've observed is that broker 2 > will not be in any ISRs of partitions where broker 0 is leader, but will be > in ISRs of partitions where broker 1 is leader. Then at some point the > controller will change to a different broker, then 2 will not be in any > ISRs where 1 is leader, but will be in ISRs where 0 is leader. Each time > controller changes, this "flip flopping" of 2 in/out of ISRs changes. No > matter what, 2 never seems to get into all ISRs. > > For topics with replicas=3, min.insync.replicas=2, and producers with > acks=all, we only ever have ISR=(0,1), and occasionally 0 or 1 also briefly > falls out of ISR, leading to producer retries and sometimes send failures > for producers that use retries=3. > > Any ideas what might be happening here, and how we could fix it? Or > additional data we could collect to try to diagnose the problem? We are > planning to upgrade this cluster as soon as we get it working correctly. > > Thanks, > Zach >