Re: Rebalancing stuck, never finishes

2021-02-26 Thread Sophie Blee-Goldman
Peter, It does seem like KAFKA-9752 is the most likely suspect, although if your clients were upgraded to 2.6.1 then I don't believe they would be on an early enough version of the JoinGroup to run into this. I'm not 100% sure though, it may be a good idea to leave a comment on that ticket and pin

Re: Rebalancing stuck, never finishes

2021-02-26 Thread Murilo Tavares
Just to provide a bit more detail, I noticed Peter's pattern: "Rebalance failed. org.apache.kafka.common.errors.DisconnectException: null" "(Re-)joining group" But I also get a different pattern, interchangeably: Group coordinator broker-1:9092 (id: 2147483646 rack: null) is unavailable or invalid

Re: Rebalancing stuck, never finishes

2021-02-26 Thread Murilo Tavares
Hi I got the same behaviour yesterday while trying to upgrade my KafkaStreams app from 2.4.1 to 2.7.0. Our brokers are on 2.2.1. Looking at KAFKA-9752 it mentions the cause being two other tickets: https://issues.apache.org/jira/browse/KAFKA-7610 https://issues.apache.org/jira/browse/KAFKA-9232 A

Re: Rebalancing stuck, never finishes

2021-02-26 Thread Péter Sinóros-Szabó
Hey Sophie, thanks for the link, I was checking that ticket, but I was not sure if it is relevant for our case. Eventually we "fixed" our problem with reducing the session.timeout.ms (it was set to a high value for other reasons). But today, in another service, we faced the same problem when upgr

Re: Rebalancing stuck, never finishes

2021-02-25 Thread Sophie Blee-Goldman
Hey Peter, It does sound like you may have hit https://issues.apache.org/jira/browse/KAFKA-9752 You will need to upgrade your brokers in order to get the fix, since it's a broker-side issue On Tue, Feb 9, 2021 at 2:48 AM Péter Sinóros-Szabó wrote: > Hi, > > I have an application running with 6

Rebalancing stuck, never finishes

2021-02-09 Thread Péter Sinóros-Szabó
Hi, I have an application running with 6 instances of it on Kubernetes. All 6 instances (pods) are the same, using the same consumer group id. Recently we see that when the application is restarted (rolling restart on K8s), the triggered rebalancing sometimes doesn't finish at all and the Kafka Cl