Re: [PR] KAFKA-17397: Ensure ClassicKafkaConsumer sends leave request on close even if interrupted [kafka]

via GitHub Tue, 17 Feb 2026 18:33:15 -0800


chickenchickenlove commented on PR #21332:
URL: https://github.com/apache/kafka/pull/21332#issuecomment-3918157708

@lianetm
Thanks a lot for the comment!
Everything you pointed out sounds reasonable to me.
I’d like to discuss a couple of points a bit further.

> On top of that, changing the logic that prioritized metadata request seems
very risky. E.g, could we be filling up the max_in_flight with this leave
request that we want to allow to go before metadata, so delaying metadata in
new unintended ways? Not sure, needs more thought, but I wouldn't change this
without a good motivation.

1. Whether we must wait for ApiKeys.LEAVE_GROUP when metadata is expired
From my perspective, the information we get from refreshing metadata doesn’t
seem to be directly related to what ApiKeys.LEAVE_GROUP needs. That said, one
possibility is that the group coordinator instance mapped to a brokerId could
change, and in that case leaving might end up failing anyway, especially if it
coincides with a channel disconnect/reconnect scenario.
Am I missing any dependency here?

2. Risk that LEAVE_GROUP consumes in_flight_request and delays METADATA
It doesn’t seem that the information obtained from METADATA is strongly
related to what KafkaConsumer.close() needs. LEAVE_GROUP appears to require
only the group coordinator (and its instance identifier), so my assumption was
that even if LEAVE_GROUP temporarily occupies in-flight slots, it might not
have a meaningful negative impact on the subsequent flow.

That said, as you mentioned, it’s possible that this introduces an
unexpected delay path, so I’d appreciate your thoughts on whether this
assumption is valid.

> Then I do think we should review the best-effort approach to send the
leave group. Can we do better and retry/wait while there is time? (not to
address this interrupt scenario, here there's not much we can do, but to ensure
we retry leaving when unsubscribing or close with time maybe). I had filed a
jira for this a long time ago actually
https://issues.apache.org/jira/browse/KAFKA-15954 , we can maybe resurface that
separately.

At the same time, it seems there may be room to mitigate the current
behavior at a higher level. Right now, after sending LEAVE_GROUP from
AbstractCoordinator, we call pollNoWakeUp only once, and it looks like this can
cause LEAVE_GROUP to miss its chance to actually be sent when metadata happens
to be expired at that moment.

https://github.com/apache/kafka/blob/f568932e45dd055bc7b95b3f2f0e3ca09ca0ee57/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AbstractCoordinator.java#L1170-L1188

For example, if AbstractCoordinator had a simple flag (or a similar signal)
to indicate whether LEAVE_GROUP was sent/processed, we might be able to call
pollNoWakeUp a few more times to increase the chances of sending it, or make
leaving more reliable when there’s still time budget available. What do you
think about this approach?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] KAFKA-17397: Ensure ClassicKafkaConsumer sends leave request on close even if interrupted [kafka]

Reply via email to