whew, long response from Matthias :P Lot to digest but I want to add on/respond to a few points:
If they want to be "advantageous", they could make it a two step upgrade > I guess, and go from 2.5 (or older) directly to 3.x and apply all > required code changes in a single upgrade step, and repeat the same to > upgrade to 4.0. But I would not necessarily recommend to do an non-API > compatible upgrade directly, and for sure officially discourage it for > two major releases. Are we still talking about only API compatibility here? Because I'm not so sure why we would officially discourage upgrading across 2 major releases as long as their code is compatible. Of course if you're referring to possible gotchas from upgrading over such a long period, that's worth discussing, but it's independent of API compatibility. Imo API compatibility is a binary thing: either it's possible to do a direct upgrade or it's not. Why do we have to officially recommend anything? Or we can distinguish > between ALOS and EOS and have an "bridge release version range" for both > cases. I like this idea. EOS and ALOS are very distinct in Streams and may only become moreso divided over time. It's worth calling them out as separate cases Now regarding the eager/cooperative rebalancing protocol thing in Streams: As Matthias said we hope to officially drop support for eager rebalancing in 4.0, and I've prepared a PR for this already: https://github.com/apache/kafka/pull/18988 This does have the effect of forcing a bridge release for users hoping to upgrade directly from 2.3 or below to 4.0+, and users will have to follow a specific upgrade path to do so as outlined in the PR description. Assuming we fit that into 4.0, it should definitely be called out in this KIP. (Basically users need to use the `upgrade.from` config to first upgrade to the bridge release, then go on to 4.0) There are also other runtime incompatibilities that have been introduced into Streams over the years that restrict direct live upgrades across certain versions. It would also be good to call this out in the KIP and point to the `upgrade.from` config, though we can point to the Streams upgrade guide for details rather than try to reiterate everything here. On Thu, Feb 20, 2025 at 5:21 PM Matthias J. Sax <mj...@apache.org> wrote: > Hello, > > took me some time, and sorry for the long email, but it's complicated... > > First, I just re-read the latest version of the KIP. Thanks for all the > updates. > > > One thing that I an missing in the motivation is, that we really want to > stop support direct upgrades from older versions, to cut down our > upgrade matrix for testing. The motivation does somewhat touch on it, > but I think it would be good to be more explicit. Even if it isn't > something users care about, it's a second main motivation for us, in > addition to the complexity to actually keep the versions compatible. > > > I also want to further clarify my understanding of the KIP. The goal is > not to define what upgrades are possible, right? What is possible is > much more nuanced. -- But we rather want to define what we recommend? Is > this understanding correct? If yes, it might also be worth to add to the > motivation section. > > > I also think, we actually need to more explicitly distinguish three > categories of compatibility, but did so far only discuss two of them. > Even if the KIP does mention all three. Ideally, we should have a > section in the motivation, explaining the three different types of > compatibility, and explicitly state which one this KIP is concerned > with, and which ones it's not concerned with. > > > (1) protocol compatibility: ie, what client-broker versions are compatible > > This one is not in the focus of the KIP, but it might still be good to > be explicit about it. Could be explained in the motivation for > completeness, and maybe refer to KIP-896 for 4.x related changes. > > Btw: there is also some additional limitations for KS-broker > compatibility: > > https://kafka.apache.org/39/documentation/streams/upgrade-guide#streams_api_broker_compat > > Many of you know this, but wanted to mention it for completeness. Not > sure if we need to mention it on the KIP. > > > > (2) API compatibility (ie Java/Scala API). > > This is only mentioned briefly in the KIP, and again, it's not the core > of the KIP, but I think it is still important to include it more > explicitly, because we talk about "bridge version". > > Given the rule that we are allowed to break API compatibility in major > release, but still guarantee API compatibility for the last three minor > releases, it can be confusing and it would be great to explain it better. > > In the end, directly upgrading from 2.5 or older to 4.x is practically > impossible as we went through two major releases which did remove > deprecated APIs, and I would not recommend to do such a direct upgrade. > > From an API POV, if one is on 2.5 or older, they should first upgrade > to 2.6/2.7/2.8, and than lazily migrate off any older stuff what is > removed with 3.0. Afterwards, they can upgrade to 3.7/3.8/3.9 following > the same pattern, and only upgrade to 4.0 in a third step. > > If they want to be "advantageous", they could make it a two step upgrade > I guess, and go from 2.5 (or older) directly to 3.x and apply all > required code changes in a single upgrade step, and repeat the same to > upgrade to 4.0. But I would not necessarily recommend to do an non-API > compatible upgrade directly, and for sure officially discourage it for > two major releases. > > Thus, the information in the KIP about "bridge version" to 4.x begin > "2.4.x - 3.9.x" seems to fall short, and mentioning > > > To minimize code refactoring, we recommend the following bridge versions > that maintain API compatibility with Kafka 4.x: > > > > Kafka Client: 3.3.x - 3.9.x > > Kafka Streams: 3.6.x - 3.9.x > > seems not to be sufficient to me. > > > Hence, the provided "Upgrade Examples" might be oversimplified, and we > might want to refine them. > > > > (3) Runtime compatibility. This one is specific to Kafka Streams, but > not to clients from my understanding. Client are stateless and thus they > don't face any issue, but Kafka Streams is stateful, and thus need to > take care of it. Please correct me if I am wrong. > > The KIP so far, seems to only consider this one, and what is proposed > make sense to me on a high level. However, I am confused why Kafka > Clients are mentioned here, too, as this type of compatibility should > not really be relevant for them? Even if clients might also have some > semantic changes, these should always align with API changes (ie, old > deprecate API might have slightly different semantics than new API). > > Now about the currently proposes ranges from Kafka Streams: > > > Kafka Streams > > Current Version 0.11.x - 2.3.x > > Bridge Release 2.4.x - 3.9.x > > Target Version 4.x > > This could make sense for "eager vs cooperative" rebalancing, however, > at the current point, we did not remove "eager" in 4.0 yet. I was > actually just syncing up with Sophie about it, and it was a slip, and we > want to propose to remove "eager" in 4.0 (Sophie will prepare a PR), so > we can avoid keeping "eager" until 5.0. > > We did officially deprecate "eager" in 3.1 release, so we are covered to > actually remove it with 4.0. > > If we would not drop "eager", using `2.4.x to 3.9.x` would not make > sense though. If we keep "eager" in 4.0, user can still upgrade from > 2.0.x to 4.0.x w/o issues from a runtime perspective. > > If we drop "eager" we also need to drop the corresponding system tests > that upgrade to 4.0, and also stop testing upgrading from "eager to > cooperative" with 3.9 being the highest target version in this system > test. And if we don't test it, it's not officially supported any > longer... (even if people could still upgrade via an offline upgrade -- > what really breaks if we remove "eager" is "only" the online > [two-]rolling bounce upgrade...) > > However, there is another change we want to consider: we did remove > EOSv1 in 4.0 release, which was replace with EOSv2 in Kafka Streams 2.6 > via KIP-447. > > Thus, for EOSv1 users, they cannot directly upgrade to 4.0 either, but > only EOSv2 users can. Thus, it might make sense to actually use "bridge > releases 2.6.x - 3.9.x" just to keep it simple... Or we can distinguish > between ALOS and EOS and have an "bridge release version range" for both > cases. > > Btw: using EOSv2 required broker version 2.5+, that we might also want > to call out. > > > > Last but not least, while we are very explicit in the KS upgrade docs, > it might be worth to call out that some upgrades require a two-rolling > bounce approach, and users should always consult the upgrade docs... We > use two-rolling bounce upgrade to bridge runtime backward incompatible > changes (similar to what we do broker side, when IBP version is bumped). > > > > > So overall, it seems that we need to really have two guidelines, not > just one? For for API compatibility, which is much stricter, and one for > runtime compatibility? > > If we really want to make a recommendation that is most easy to > understand, we might want to only go with API compatibility. Not sure if > this might be "too restrictive" though? > > > Curious to get you though on all this. > > > -Matthias > > > > > On 2/19/25 5:51 PM, Kuan Po Tseng wrote: > > Hi Lianet, > > > > Thank you for your feedback! > > > > Yes, the current KIP focuses solely on the client upgrade for 4.x. I > have updated the title accordingly and also included the KS upgrade link in > the KIP. > > > > Thanks again! > > > > Best regards, > > Kuan-Po > > > > On 2025/02/19 16:59:25 "Lianet M." wrote: > >> Hello all, sorry a bit late, just minor comments on this one: > >> > >> - Should we clarify in the title or at the beginning of the KIP that it > is > >> proposing a client upgrade path for 4.x? The broader considerations for > >> upgrades discussed in this thread will be tackled separately (seems we > all > >> agree). > >> > >> - The KS upgrade path seems to be the tricky one, and all that the user > >> needs to consider to successfully follow the provided path for KS is not > >> clear in the KIP, but it's all well explained on the KS upgrade notes > for > >> 3.9, should we add a ref to that? > >> https://kafka.apache.org/39/documentation/streams/upgrade-guide > >> > >> Thanks Kuan Po! > >> > >> Lianet > >> > >> On Tue, Feb 11, 2025 at 11:22 AM Kuan Po Tseng <brandb...@gmail.com> > wrote: > >> > >>> Hello everyone, > >>> > >>> If there are no other opinions, I would like to start a vote tomorrow, > >>> thank you! > >>> > >>> Best, > >>> Kuan Po > >>> > >>> On Sat, Feb 8, 2025 at 1:51 AM Kuan Po Tseng <brandb...@apache.org> > wrote: > >>> > >>>> Hi all, > >>>> > >>>> Based on our discussion, I added a section on choosing the appropriate > >>>> bridge version from an API compatibility perspective for upgrading to > >>> Kafka > >>>> 4.0. Let me know if you have any thoughts. Thank you! > >>>> > >>>> Best, > >>>> Kuan-Po > >>>> > >>>> On 2025/02/07 03:34:46 Kuan Po Tseng wrote: > >>>>> Hi Chia-Ping, > >>>>> > >>>>> Sorry for the delayed response. I’ve checked all relevant JIRAs using > >>>> the following Jira Query Language: > >>>>> > >>>>> project = KAFKA AND status in (Resolved, Closed) AND fixVersion = > 4.0.0 > >>>> AND text ~ "Remove" order by updated DESC > >>>>> > >>>>> Based on this, I checked the JIRAs related to removing deprecated > >>>> methods in client modules. The minimum backward-compatible client > >>> versions > >>>> for client 4.0 are as follows: > >>>>> - Producer: 3.3.0 > >>>>> Reason: Partitioner#onNewBatch was deprecated in 3.3.0, and was > >>>> removed by https://issues.apache.org/jira/browse/KAFKA-18295 > >>>>> - Consumer: 2.4.0 > >>>>> Reason: Consumer#committed was deprecated in 2.4.0, and was > removed > >>> by > >>>> https://issues.apache.org/jira/browse/KAFKA-17451 > >>>>> - Admin: 3.3.0 > >>>>> Reason: ListConsumerGroupOffsetsOptions was deprecated in 3.3.0 > and > >>>> was removed by https://issues.apache.org/jira/browse/KAFKA-18291 > >>>>> > >>>>> You can find a list of all related JIRAs and pull requests in this > >>>> Google Sheet: > >>>>> > >>>> > >>> > https://docs.google.com/spreadsheets/d/1ZWNRk1rjWptjpGM2UtT0Q3lDULhrqkP_UfHr9roQW3M/edit?usp=sharing > >>>>> > >>>>> There are also some public methods removed in 4.0, such as: > >>>>> - KafkaFuture#Function, KafkaFuture#thenApply > >>>> https://issues.apache.org/jira/browse/KAFKA-17903 > >>>>> - JmxReported(String) > >>> https://issues.apache.org/jira/browse/KAFKA-18077 > >>>>> , but I'm uncertain about how we should handle these. > >>>>> > >>>>> Best, > >>>>> Kuan-Po > >>>>> > >>>>> On 2025/02/06 19:08:49 Chia-Ping Tsai wrote: > >>>>>> hi Kuan-Po > >>>>>> > >>>>>> any update? Now that an upgrade path for bridge versions exists, we > >>>> can introduce additional "conditions" to assist users in selecting the > >>>> "best" bridge version. For example, we can provide guidance on which > >>> bridge > >>>> versions offer backward compatibility with Kafka 4.0 client or are > >>>> compatible with Kafka 4.0 server. > >>>>>> > >>>>>> Best, > >>>>>> Chia-Ping > >>>>>> > >>>>>> On 2025/01/22 04:48:36 Chia-Ping Tsai wrote: > >>>>>>>> - If we support 2.0+ to 4.0 client/KS upgrade it's simpler, but > >>> of > >>>> course brokers cannot be 4.0 yet -- but I guess this would be > something > >>>> natural? Given that the clients would be on 2.0, brokers cannot be 4.0 > >>> yet, > >>>> or clients would have crashed already... Thus, I think I slightly > prefer > >>>> this one. > >>>>>>> > >>>>>>> Using a major version as a bridge is a viable approach. We can > >>>> emphasize the limitations of this method to guide users in selecting > the > >>>> most suitable bridge version. > >>>>>>> > >>>>>>>> For KS, from an API compatibility POV, upgrading from anything > >>>> older than 3.6 might not work any longer (for DSL users; of course, > >>>> depending on what APIs they are using). And for PAPI, the old API was > >>>> removed too, so only if the new one is use (introduced in 2.7) a > seamless > >>>> upgrade would work smoothly. > >>>>>>> > >>>>>>> You make a valid point. The previous discussion overlooked the APIs > >>>> that were removed in version 4.0. > >>>>>>> > >>>>>>> We could also emphasize the BC advantages. As an example, users > >>> have > >>>> the option of using version 2.7 as a bridge and subsequently upgrade > >>>> without code alterations or recompilation. Of course, we need to check > >>> the > >>>> version of other PAPI removal. > >>>>>>> > >>>>>>> Best, > >>>>>>> Chia-Ping > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> Matthias J. Sax <mj...@apache.org> 於 2025年1月22日 凌晨2:55 寫道: > >>>>>>>> For KS, from an API compatibility POV, upgrading from anything > >>>> older than 3.6 might not work any longer (for DSL users; of course, > >>>> depending on what APIs they are using). And for PAPI, the old API was > >>>> removed too, so only if the new one is use (introduced in 2.7) a > seamless > >>>> upgrade would work smoothly. > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > >