[ https://issues.apache.org/jira/browse/KAFKA-12455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301956#comment-17301956 ]
Ron Dagostino commented on KAFKA-12455: --------------------------------------- The test is using the default value of metadata.max.age.ms=300000 (5 minutes). When I explicitly turn it down to metadata.max.age.ms=5000 (5 seconds) the test passes for Raft but then fails for ZK (2 unexpected group rebalances in that case). I increased it to 10 seconds and then the Raft configuration failed with 3 unexpected rebalances and the ZK configuration failed with 1 unexpected rebalance. I decreased it to a very aggressive 1 second -- and they both passed. We have historically seen some flakiness in the ZooKeeper version of this test, and the fact that the test suddenly failed if we set metadata.max.age.ms to 5 or 10 seconds indicates that the it is just plain luck that the test is passing today. Given that the current client-side code doesn't fall back to the bootstrap brokers when it sees no brokers available, I think any test really needs to make it *impossible* for the client to see cluster metadata with just a single broker. Decreasing the metadata max age decreases the possibility of it happening but doesn't make it impossible. Another experiment was to keep metadata.max.age.ms=300000 but define session.timeout.ms = 30000 instead of the 10000 it was setting before -- this is longer tyan the broker roll time, and in fact this change allows both configurations to pass. A further experiment was to keep metadata.max.age.ms=300000 and session.timeout.ms = 10000 but expand to 3 brokers instead of just 2. This should fix the issue since there would never be a situation where just 1 broker is available, and a METADATA response would always have at least 2 brokers for the consumer to use. Both configurations pass. > OffsetValidationTest.test_broker_rolling_bounce failing for Raft quorums > ------------------------------------------------------------------------ > > Key: KAFKA-12455 > URL: https://issues.apache.org/jira/browse/KAFKA-12455 > Project: Kafka > Issue Type: Bug > Affects Versions: 2.8.0 > Reporter: Ron Dagostino > Assignee: Ron Dagostino > Priority: Blocker > > OffsetValidationTest.test_broker_rolling_bounce in `consumer_test.py` is > failing because the consumer group is rebalancing unexpectedly. -- This message was sent by Atlassian Jira (v8.3.4#803005)