I believe we've figured out the root cause of KAFKA-12896 <https://issues.apache.org/jira/browse/KAFKA-12896>, and should have a fix prepared shortly. See the linked issues for more details.
Regarding KIP-726 itself, given that the latest proposal is fully compatible and does not require any breaking changes, we may or may not push to get this into 3.0. Cooperative rebalancing is a huge improvement for the majority of consumer applications, but it's already available for those who want it, so I don't feel too badly about letting it slip from 3.0 in order to focus on other things. I would also personally feel better about merging it at the beginning of 3.1 so we have the entire release cycle to flush out any potential regressions, rather than rushing it in at the last moment (even though the majority of the work has been done for a while). If anyone feels strongly about getting this into 3.0, please let me know, as it's definitely still within reach. Otherwise I'll probably aim for 3.1 instead. Note (@Luke in particular): we could opt for a partial and risk-free improvement in 3.0, that would make it easier for users to turn on cooperative rebalancing without actually enabling it for them (yet). If we change the default config to ["range", "cooperative-sticky"] in 3.0, then the RangeAssignor will remain the default but any application can upgrade to the CooperativeStickyAssignor with just a single rolling bounce (to remove the "range" assignor), rather than the usual two. We can then go ahead and fully make the CooperativeStickyAssignor the default assignor in 3.1 by changing the default config to ["cooperative-sticky", "range"] as this KIP has proposed. Thoughts? On Thu, Jun 10, 2021 at 1:46 AM Luke Chen <show...@gmail.com> wrote: > Hi Ryan, > Thanks for your good comments. I've listed your comments in "Rejected > Alternatives" in KIP. > > 1. Some cooperative-sticky related defects might not free before V3.0 > → We've marked important defects as blocker for V3.0, ex: KAFKA-12896. > Please raise any important defect if you found any. > > 2. Cooperative-sticky assignor is also very new for C/C++ users in > librdkafka, so not many in that community have tried incremental > cooperative yet. And bugs are still recently being worked out there too. > → Thanks for raising this. I checked this library and found currently > only 1 cooperative-sticky related bug open, which is good (10 bugs are > fixed). Anyway, I think the clients can always change the assignor to other > assignors if there are still bugs in the library. > > Thank you. > Luke > > On Thu, Jun 10, 2021 at 12:40 PM Ryan Leslie <rles...@bloomberg.net> > wrote: > > > Thanks for the quick replies, Luke and Sophie. > > > > I've not voted, but I agree with accepting the KIP since it's a superior > > feature. I was just reacting mostly to this comment since it didn't > mention > > open issues: > > > > > > > Thanks Luke. We may as well get this KIP in to 3.0 so that we can > > fully > > > > > enable cooperative rebalancing > > > > > by default in 3.0 if we have KAFKA-12477 done in time, and if we > > don't > > > > then > > > > > there's no harm as it's > > > > > not going to change the behavior. > > > > But I see now, as Luke said, that the main issue is already considered a > > blocker so it was assumed. Though, I did also wonder if any bugs that may > > have existed since several version ago should actually hold up 3.0, > which I > > know is especially about moving away from ZooKeeper. > > > > My sentiment was just that during many release cycles of Kafka since > > cooperative was introduced, there have been issues discovered. And that > > makes sense given that the implementation was complex and quite a lot of > > code changed to make it happen. Hopefully the last of the kinks will have > > been worked out before 3.0. I just wondered if it should be a default in > > 3.0 if it hasn't yet been free of defects for a significant period of > time. > > KIP-726 doesn't list any drawbacks for cooperative-sticky, but perhaps > this > > is one. I also appreciate that it's already successfully adopted by many, > > particularly streams / connect users. But this may also be where the > > feature has the most benefit due to expensive setup/teardown during > > rebalance, and stop-the-world can be less of a concern for many "regular > > consumers". > > > > This is probably irrelevant here, but another thing to mention is that > the > > feature is also very new for C/C++ users in librdkafka, so not many in > that > > community have tried incremental cooperative yet. And bugs are still > > recently being worked out there too. > > > > Just playing devil's advocate here, sorry to come across as a negative > > nancy! > > > > On 2021/06/09 00:05:41, Sophie Blee-Goldman <sop...@confluent.io.INVALID > > > > wrote: > > > Hey Ryan, > > > > > > Yes, I believe any open bugs regarding the cooperative-sticky assignor > > > should be considered as blockers > > > to it being made the default, if not blockers to the release in > general. > > I > > > don't think they need to block the > > > acceptance of this KIP, though, just possibly the implementation of it. > > >