[ https://issues.apache.org/jira/browse/KAFKA-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Gustafson resolved KAFKA-9815. ------------------------------------ Fix Version/s: 2.4.2 2.5.0 Resolution: Fixed > Consumer may never re-join if inconsistent metadata is received once > -------------------------------------------------------------------- > > Key: KAFKA-9815 > URL: https://issues.apache.org/jira/browse/KAFKA-9815 > Project: Kafka > Issue Type: Bug > Components: consumer > Reporter: Rajini Sivaram > Assignee: Rajini Sivaram > Priority: Major > Fix For: 2.5.0, 2.4.2 > > > KAFKA-9797 is the result of an incorrect rolling upgrade test where a new > listener is added to brokers and set as the inter-broker listener within the > same rolling upgrade. As a result, metadata is inconsistent across brokers > until the rolling upgrade completes since interbroker communication is broken > until all brokers have the new listener. The test fails due to consumer > timeouts and sometimes this is because the upgrade takes longer than consumer > timeout. But several logs show an issue with the consumer when one metadata > response received during upgrade is different from the consumer's cached > `assignmentSnapshot`, triggering rebalance. > In > [https://github.com/apache/kafka/blob/7f640f13b4d486477035c3edb28466734f053beb/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java#L750,] > we return true for `rejoinNeededOrPending()` if `assignmentSnapshot` is not > the same as the current `metadataSnapshot`. We don't set `rejoinNeeded` in > the instance, but we revoke partitions and send JoinGroup request. If the > JoinGroup request fails and a subsequent metadata response contains the same > snapshot value as the previously cached `assignmentSnapshot`, we never send > `JoinGroup` again since snapshots match and `rejoinNeeded=false`. Partitions > are not assigned to the consumer after this and the test fails because > messages are not received. > Even though this particular system test isn't a valid upgrade scenario, we > should fix the consumer, since temporary metadata differences between brokers > can result in this scenario. -- This message was sent by Atlassian Jira (v8.3.4#803005)