Hongten opened a new pull request, #21654: URL: https://github.com/apache/kafka/pull/21654
**Problem** When a broker is down and executeAssignment() runs, incrementalAlterConfigs can throw TimeoutException because it tries to set broker-level throttles (inter-broker and log-dir) on brokers that are no longer live. The AdminClient sends config updates directly to each broker; if a broker is down, the request times out and the reassignment fails. **Solution** Apply throttle configuration only to brokers that are currently live. Live brokers are determined via adminClient.describeCluster().nodes(). Throttle configs are no longer sent to brokers that are not in this set, so reassignment can proceed even when some brokers are down. I provided another solution before, Refer to https://github.com/apache/kafka/pull/21388, it will add a new parameter `--broker-list-without-throttle` to remove the broker, which is down. The new solution is better I think. **Changes** 1. getLiveBrokerIds() helper Returns the set of broker IDs from describeCluster().nodes(). 2. modifyInterBrokerThrottle() Filters reassigningBrokers to only live brokers before calling incrementalAlterConfigs. Returns early if no brokers are live. 3. modifyLogDirThrottle() Filters movingBrokers to only live brokers before calling incrementalAlterConfigs. Returns early if no brokers are live. 4. clearAllThrottles() Clears broker-level throttles only for live brokers. Uses retainAll(liveBrokers) so we never call incrementalAlterConfigs on down brokers. **Testing** Unit tests in ReassignPartitionsUnitTest: 1. testModifyInterBrokerThrottleSkipsDownBrokers – MockAdminClient with 3 brokers; modifyInterBrokerThrottle is called with brokers 0–3. Verifies throttle is applied only to 0, 1, 2 and broker 3 is skipped. 2. testModifyLogDirThrottleSkipsDownBrokers – Same setup for modifyLogDirThrottle. 3. testExecuteAssignmentWithThrottleWhenBrokerDown – Full executeAssignment with throttle on a 3-broker cluster; verifies reassignment succeeds and throttle is applied only to live brokers. 4. testModifyInterBrokerThrottleWhenAllBrokersDown – All reassigning brokers are down; verifies no exception and early return without calling incrementalAlterConfigs. MockAdminClient with 3 brokers is used to simulate broker 3 being down (not in describeCluster().nodes()). Without the fix, incrementalAlterConfigs for broker 3 would fail with InvalidRequestException in the mock (or TimeoutException in a real cluster). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
