Hongten opened a new pull request, #21654:
URL: https://github.com/apache/kafka/pull/21654

   **Problem**
   When a broker is down and executeAssignment() runs, incrementalAlterConfigs 
can throw TimeoutException because it tries to set broker-level throttles 
(inter-broker and log-dir) on brokers that are no longer live. The AdminClient 
sends config updates directly to each broker; if a broker is down, the request 
times out and the reassignment fails.
   
   **Solution**
   Apply throttle configuration only to brokers that are currently live. Live 
brokers are determined via adminClient.describeCluster().nodes(). Throttle 
configs are no longer sent to brokers that are not in this set, so reassignment 
can proceed even when some brokers are down.
   I provided another solution before, Refer to 
https://github.com/apache/kafka/pull/21388, it will add a new parameter 
`--broker-list-without-throttle` to remove the broker, which is down. 
   The new solution is better I think.
   
   **Changes**
   1. getLiveBrokerIds() helper
   Returns the set of broker IDs from describeCluster().nodes().
   2. modifyInterBrokerThrottle()
   Filters reassigningBrokers to only live brokers before calling 
incrementalAlterConfigs.
   Returns early if no brokers are live.
   3. modifyLogDirThrottle()
   Filters movingBrokers to only live brokers before calling 
incrementalAlterConfigs.
   Returns early if no brokers are live.
   4. clearAllThrottles()
   Clears broker-level throttles only for live brokers.
   Uses retainAll(liveBrokers) so we never call incrementalAlterConfigs on down 
brokers.
   
   **Testing**
   Unit tests in ReassignPartitionsUnitTest:
   1. testModifyInterBrokerThrottleSkipsDownBrokers – MockAdminClient with 3 
brokers; modifyInterBrokerThrottle is called with brokers 0–3. Verifies 
throttle is applied only to 0, 1, 2 and broker 3 is skipped.
   2. testModifyLogDirThrottleSkipsDownBrokers – Same setup for 
modifyLogDirThrottle.
   3. testExecuteAssignmentWithThrottleWhenBrokerDown – Full executeAssignment 
with throttle on a 3-broker cluster; verifies reassignment succeeds and 
throttle is applied only to live brokers.
   4. testModifyInterBrokerThrottleWhenAllBrokersDown – All reassigning brokers 
are down; verifies no exception and early return without calling 
incrementalAlterConfigs.
   
   MockAdminClient with 3 brokers is used to simulate broker 3 being down (not 
in describeCluster().nodes()). Without the fix, incrementalAlterConfigs for 
broker 3 would fail with InvalidRequestException in the mock (or 
TimeoutException in a real cluster).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to