Hi Colin, Thanks for the feedback. Those are very good points. I have made the following changes to the KIP as you had suggested: 1. Included the `timeoutMs` field in the `UpdateFeaturesRequest` schema. The initial implementation won't be making use of the field, but we can always use it in the future as the need arises. 2. Modified the `FinalizedFeaturesEpoch` field in `ApiVersionsResponse` to use int64. This is to avoid overflow problems in the future once ZK is gone.
I have also incorporated these changes into the versioning write path PR that is currently under review: https://github.com/apache/kafka/pull/9001. Cheers, Kowshik On Fri, Sep 25, 2020 at 4:57 PM Kowshik Prakasam <kpraka...@confluent.io> wrote: > Hi Jun, > > Thanks for the feedback. It's a very good point. I have now modified the > KIP-584 write-up "goals" section a bit. It now mentions one of the goals as > enabling rolling upgrades using a single restart (instead of 2). Also I > have removed the text explicitly aiming for deprecation of IBP. Note that > previously under "Potential features in Kafka" the IBP was mentioned under > point (4) as a possible coarse-grained feature. Hopefully, now the 2 > sections of the KIP align with each other well. > > > Cheers, > Kowshik > > > On Fri, Sep 25, 2020 at 2:03 PM Colin McCabe <cmcc...@apache.org> wrote: > >> On Tue, Sep 22, 2020, at 00:43, Kowshik Prakasam wrote: >> > Hi all, >> > >> > I wanted to let you know that I have made the following changes to the >> > KIP-584 write up. The purpose is to ensure the design is correct for a >> few >> > things which came up during implementation: >> > >> >> Hi Kowshik, >> >> Thanks for the updates. >> >> > >> > 1. Per FeatureUpdate error code: The UPDATE_FEATURES controller API is >> no >> > longer transactional. Going forward, we allow for individual >> FeatureUpdate >> > to succeed/fail in the request. As a result, the response schema now >> > contains an error code per FeatureUpdate as well as a top-level error >> code. >> > Overall this is a better design because it better represents the nature >> of >> > the API: each FeatureUpdate in the request is independent of the other >> > updates, and the controller can process/apply these independently to ZK. >> > When an UPDATE_FEATURES request fails, this new design provides better >> > clarity to the caller on which FeatureUpdate could not be applied (via >> the >> > individual error codes). In the previous design, we were unable to >> achieve >> > such an increased level of clarity in communicating the error codes. >> > >> >> OK >> >> > >> > 2. Due to #1, there were some minor changes required to the proposed >> Admin >> > APIs (describeFeatures and updateFeatures). A few unnecessary public >> APIs >> > have been removed, and couple essential ones have been added. The latest >> > changes now represent the latest design. >> > >> > 3. The timeoutMs field has been removed from the the UPDATE_FEATURES API >> > request, since it was not found to be required during implementation. >> > >> >> Please don't get rid of timeoutMs. timeoutMs is required if you want to >> implement the ability to timeout the call if the controller can't get to it >> in time. This is important for avoiding congestion collapse where the >> controller collapses under the weight of lots of retries of the same set of >> calls. >> >> We may not be able to do it in the initial implementation, but we will >> eventually implement this for all the controller-bound RPCs. >> >> > > >> > > 2. Finalized feature version epoch data type has been made to be int32 >> > > (instead of int64). The reason is that the epoch value is the value >> of ZK >> > > node version, whose data type is int32. >> > > >> >> Sorry, I missed this earlier. Using 16 bit feature levels seems fine. >> However, please don't use a 32-bit epoch here. We deliberately made the >> epoch 64 bits to avoid overflow problems in the future once ZK is gone. >> >> best, >> Colin >> >> > > 3. Introduced a new 'status' field in the '/features' ZK node schema. >> The >> > > purpose is to implement Colin's earlier point for the strategy for >> > > transitioning from not having a /features znode to having one. An >> > > explanation has been provided in the following section of the KIP >> detailing >> > > the different cases: >> > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP584:Versioningschemeforfeatures-FeatureZKnodestatus >> > > . >> > > >> > > Please let me know if you have any questions or concerns. >> > > >> > > >> > > Cheers, >> > > Kowshik >> > > >> > > >> > > >> > > Cheers, >> > > Kowshik >> > > >> > > On Tue, Apr 28, 2020 at 11:24 PM Kowshik Prakasam < >> kpraka...@confluent.io> >> > > wrote: >> > > >> > >> Hi all, >> > >> >> > >> This KIP vote has been open for ~12 days. The summary of the votes is >> > >> that we have 3 binding votes (Colin, Guozhang, Jun), and 3 >> non-binding >> > >> votes (David, Dhruvil, Boyang). Therefore, the KIP vote passes. I'll >> mark >> > >> KIP as accepted and start working on the implementation. >> > >> >> > >> Thanks a lot! >> > >> >> > >> >> > >> Cheers, >> > >> Kowshik >> > >> >> > >> On Mon, Apr 27, 2020 at 12:15 PM Colin McCabe <cmcc...@apache.org> >> wrote: >> > >> >> > >>> Thanks, Kowshik. +1 (binding) >> > >>> >> > >>> best, >> > >>> Colin >> > >>> >> > >>> On Sat, Apr 25, 2020, at 13:20, Kowshik Prakasam wrote: >> > >>> > Hi Colin, >> > >>> > >> > >>> > Thanks for the explanation! I agree with you, and I have updated >> the >> > >>> > KIP. >> > >>> > Here is a link to relevant section: >> > >>> > >> > >>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Controller:ZKnodebootstrapwithdefaultvalues >> > >>> > >> > >>> > >> > >>> > Cheers, >> > >>> > Kowshik >> > >>> > >> > >>> > On Fri, Apr 24, 2020 at 8:50 PM Colin McCabe <cmcc...@apache.org> >> > >>> wrote: >> > >>> > >> > >>> > > On Fri, Apr 24, 2020, at 00:01, Kowshik Prakasam wrote: >> > >>> > > > (Kowshik): Great point! However for case #1, I'm not sure why >> we >> > >>> need to >> > >>> > > > create a '/features' ZK node with disabled features. Instead, >> do >> > >>> you see >> > >>> > > > any drawback if we just do not create it? i.e. if IBP is less >> than >> > >>> 2.6, >> > >>> > > the >> > >>> > > > controller treats the case as though the versioning system is >> > >>> completely >> > >>> > > > disabled, and would not create a non-existing '/features' >> node. >> > >>> > > >> > >>> > > Hi Kowshik, >> > >>> > > >> > >>> > > When the IBP is less than 2.6, but the software has been >> upgraded to >> > >>> a >> > >>> > > state where it supports this KIP, that >> > >>> > > means the user is upgrading from an earlier version of the >> > >>> software. In >> > >>> > > this case, we want to start with all the features disabled and >> allow >> > >>> the >> > >>> > > user to enable them when they are ready. >> > >>> > > >> > >>> > > Enabling all the possible features immediately after an upgrade >> > >>> could be >> > >>> > > harmful to the cluster. On the other hand, for a new cluster, >> we do >> > >>> want >> > >>> > > to enable all the possible features immediately . I was >> proposing >> > >>> this as a >> > >>> > > way to distinguish the two cases (since the new cluster will >> never be >> > >>> > > started with an old IBP). >> > >>> > > >> > >>> > > > Colin MccCabe wrote: >> > >>> > > > > And now, something a little bit bigger (sorry). For >> finalized >> > >>> > > features, >> > >>> > > > > why do we need both min_version_level and max_version_level? >> > >>> Assuming >> > >>> > > that >> > >>> > > > > we want all the brokers to be on the same feature version >> level, >> > >>> we >> > >>> > > really only care >> > >>> > > > > about three numbers for each feature, right? The minimum >> > >>> supported >> > >>> > > version >> > >>> > > > > level, the maximum supported version level, and the current >> > >>> active >> > >>> > > version level. >> > >>> > > > >> > >>> > > > > We don't actually want different brokers to be on different >> > >>> versions of >> > >>> > > > > the same feature, right? So we can just have one number for >> > >>> current >> > >>> > > > > version level, rather than two. At least that's what I was >> > >>> thinking >> > >>> > > -- let >> > >>> > > > > me know if I missed something. >> > >>> > > > >> > >>> > > > (Kowshik): It is my understanding that the "current active >> version >> > >>> level" >> > >>> > > > that you have mentioned, is the "max_version_level". But we >> still >> > >>> > > > maintain/publish both min and max version levels, because, the >> > >>> detail >> > >>> > > about >> > >>> > > > min level is useful to external clients. This is described >> below. >> > >>> > > > >> > >>> > > > For any feature F, think of the closed range: >> [min_version_level, >> > >>> > > > max_version_level] as the range of finalized versions, that's >> > >>> guaranteed >> > >>> > > to >> > >>> > > > be supported by all brokers in the cluster. >> > >>> > > > - "max_version_level" is the finalized highest common version >> > >>> among all >> > >>> > > > brokers, >> > >>> > > > - "min_version_level" is the finalized lowest common version >> > >>> among all >> > >>> > > > brokers. >> > >>> > > > >> > >>> > > > Next, think of "client" here as the "user of the new feature >> > >>> versions >> > >>> > > > system". Imagine that such a client learns about finalized >> feature >> > >>> > > > versions, and exercises some logic based on the version. These >> > >>> clients >> > >>> > > can >> > >>> > > > be of 2 types: >> > >>> > > > 1. Some part of the broker code itself could behave like a >> client >> > >>> trying >> > >>> > > to >> > >>> > > > use some feature that's "internal" to the broker cluster. >> Such a >> > >>> client >> > >>> > > > would learn the latest finalized features via ZK. >> > >>> > > > 2. An external system (ex: Streams) could behave like a >> client, >> > >>> trying to >> > >>> > > > use some "external" facing feature. Such a client would learn >> > >>> latest >> > >>> > > > finalized features via ApiVersionsRequest. Ex: >> group_coordinator >> > >>> feature >> > >>> > > > described in the KIP. >> > >>> > > > >> > >>> > > > Next, imagine that for F, the max_version_level is >> successfully >> > >>> bumped by >> > >>> > > > +1 (via Controller API). Now it is guaranteed that all brokers >> > >>> (i.e. >> > >>> > > > internal clients) understand max_version_level + 1. However, >> it is >> > >>> still >> > >>> > > > not guaranteed that all external clients have support for (or >> have >> > >>> > > > activated) the logic for the newer version. Why? Because, >> this is >> > >>> > > > subjective as explained next: >> > >>> > > > >> > >>> > > > 1. On one hand, imagine F as an internal feature only >> relevant to >> > >>> > > Brokers. >> > >>> > > > The binary for the internal client logic is controlled by >> Broker >> > >>> cluster >> > >>> > > > deployments. When shipping a new Broker release, we wouldn't >> bump >> > >>> max >> > >>> > > > "supported" feature version for F by 1, unless we have >> introduced >> > >>> some >> > >>> > > new >> > >>> > > > logic (with a potentially breaking change) in the Broker. >> > >>> Furthermore, >> > >>> > > such >> > >>> > > > feature logic in the broker should/will not be implemented in >> a >> > >>> way that >> > >>> > > it >> > >>> > > > would activate logic for an older feature version after it has >> > >>> migrated >> > >>> > > to >> > >>> > > > using the logic for a newer feature version (because this >> could >> > >>> break the >> > >>> > > > cluster!). For these cases, max_version_level will be very >> useful >> > >>> for >> > >>> > > > decision making. >> > >>> > > > >> > >>> > > > 2. On the other hand, imagine F as an external facing feature. >> > >>> External >> > >>> > > > clients are not within the control of Broker cluster. An >> external >> > >>> client >> > >>> > > > may not have upgraded it's code (yet) to use >> 'max_version_level + >> > >>> 1'. >> > >>> > > But, >> > >>> > > > the Kafka cluster could have been deployed with support for >> > >>> > > > 'max_version_level + 1' of an external facing feature. Now, >> these >> > >>> > > external >> > >>> > > > clients (until an upgrade) are benefitted in learning "what >> is the >> > >>> lowest >> > >>> > > > common version for F among all brokers?". This is where the >> > >>> > > > "min_version_level" becomes useful. Using this, a client could >> > >>> learn the >> > >>> > > > specific supported versions that's lower than >> max_version_level >> > >>> (instead >> > >>> > > of >> > >>> > > > assuming that all brokers support the range: [1, >> > >>> max_version_level]). For >> > >>> > > > example, if the cluster deprecates "min_version_level", then >> the >> > >>> client >> > >>> > > > becomes aware because it periodically learns the latest >> > >>> > > "min_version_level" >> > >>> > > > via ApiVersionsRequest. >> > >>> > > > >> > >>> > > >> > >>> > > Thanks for the explanation. I agree that this does make sense >> when >> > >>> you >> > >>> > > take the client perspective into account. >> > >>> > > >> > >>> > > best, >> > >>> > > Colin >> > >>> > > >> > >>> > > >> > >>> > > > >> > >>> > > > Cheers, >> > >>> > > > Kowshik >> > >>> > > > >> > >>> > > > >> > >>> > > > >> > >>> > > > >> > >>> > > > On Thu, Apr 23, 2020 at 12:07 PM Colin McCabe < >> cmcc...@apache.org> >> > >>> > > wrote: >> > >>> > > > >> > >>> > > > > Hi Kowshik, >> > >>> > > > > >> > >>> > > > > Thanks again for working on this-- it looks great. I went >> over >> > >>> the KIP >> > >>> > > > > again and have a few more comments. >> > >>> > > > > >> > >>> > > > > === >> > >>> > > > > >> > >>> > > > > It would be good to note that deprecating a feature version >> (in >> > >>> other >> > >>> > > > > words, increasing minVersionLevel on the broker) is an >> > >>> incompatible >> > >>> > > change, >> > >>> > > > > which requires a major release of Kafka. >> > >>> > > > > >> > >>> > > > > === >> > >>> > > > > >> > >>> > > > > I think the strategy for transitioning from not having a >> > >>> /features >> > >>> > > znode >> > >>> > > > > to having one could use some work. The current proposal is >> to >> > >>> wait for >> > >>> > > all >> > >>> > > > > the brokers to fill in their feature znodes and then pick >> the >> > >>> highest >> > >>> > > > > common versions. But that requires blocking in the >> controller >> > >>> startup >> > >>> > > code >> > >>> > > > > until the whole cluster is active (technically, a point in >> time >> > >>> which >> > >>> > > we >> > >>> > > > > never really know that we have reached...) >> > >>> > > > > >> > >>> > > > > Instead, I think it would be better to have a strategy like >> this: >> > >>> > > > > 1. If the controller comes up and there is no /features >> znode >> > >>> AND the >> > >>> > > IBP >> > >>> > > > > is less than 2.6, create a /features znode where all the >> > >>> features are >> > >>> > > > > disabled. >> > >>> > > > > 2. If the controller comes up and there is no /features >> znode >> > >>> AND the >> > >>> > > IBP >> > >>> > > > > is greater than or equal to 2.6, create a /features znode >> where >> > >>> all the >> > >>> > > > > features are enabled at the highest versions supported by >> the >> > >>> > > controller. >> > >>> > > > > >> > >>> > > > > People upgrading from the pre-KIP-584 >> > >>> <https://issues.apache.org/jira/browse/KIP-584> >> > >>> > > > > <https://issues.apache.org/jira/browse/KIP-584> world will >> end >> > >>> up in >> > >>> > > case >> > >>> > > > > #1 because they have to do a double roll to upgrade, and >> during >> > >>> the >> > >>> > > first >> > >>> > > > > roll, the IBP is unchanged. People creating new clusters >> from >> > >>> scratch >> > >>> > > will >> > >>> > > > > end up in case #2, which is what we want since we don't >> want a >> > >>> brand >> > >>> > > new >> > >>> > > > > cluster to be using old feature flag versions. >> > >>> > > > > >> > >>> > > > > === >> > >>> > > > > >> > >>> > > > > UpdateFeaturesResponse#ErrorMessage should specify >> > >>> nullableVersions >> > >>> > > since >> > >>> > > > > null is a valid value here >> > >>> > > > > >> > >>> > > > > Also, in the message format, the tags we use need to start >> at 0. >> > >>> > > > > >> > >>> > > > > === >> > >>> > > > > >> > >>> > > > > I don't think we need the FEATURE_UPDATE_IN_PROGRESS error >> > >>> code. The >> > >>> > > > > controller is basically single-threaded and will only do >> one of >> > >>> these >> > >>> > > > > operations at once. Even if it weren't, though, we could >> simply >> > >>> block >> > >>> > > the >> > >>> > > > > second operation behind the first one. >> > >>> > > > > >> > >>> > > > > === >> > >>> > > > > >> > >>> > > > > For updateFeatures, it would be good to specify that if a >> single >> > >>> > > feature >> > >>> > > > > version update in the batch can't be done, none of them are >> > >>> done. I >> > >>> > > think >> > >>> > > > > this was the intention, but I wasn't able to find it >> spelled out >> > >>> > > (maybe i >> > >>> > > > > missed it). >> > >>> > > > > >> > >>> > > > > === >> > >>> > > > > >> > >>> > > > > And now, something a little bit bigger (sorry). For >> finalized >> > >>> > > features, >> > >>> > > > > why do we need both min_version_level and max_version_level? >> > >>> Assuming >> > >>> > > that >> > >>> > > > > we want all the brokers to be on the same feature version >> level, >> > >>> we >> > >>> > > really >> > >>> > > > > only care about three numbers for each feature, right? The >> > >>> minimum >> > >>> > > > > supported version level, the maximum supported version >> level, >> > >>> and the >> > >>> > > > > current active version level. >> > >>> > > > > >> > >>> > > > > We don't actually want different brokers to be on different >> > >>> versions of >> > >>> > > > > the same feature, right? So we can just have one number for >> > >>> current >> > >>> > > > > version level, rather than two. At least that's what I was >> > >>> thinking >> > >>> > > -- let >> > >>> > > > > me know if I missed something. >> > >>> > > > > >> > >>> > > > > best, >> > >>> > > > > Colin >> > >>> > > > > >> > >>> > > > > >> > >>> > > > > On Tue, Apr 21, 2020, at 13:01, Dhruvil Shah wrote: >> > >>> > > > > > Thanks for the KIP! +1 (non-binding) >> > >>> > > > > > >> > >>> > > > > > On Tue, Apr 21, 2020 at 6:09 AM David Jacot < >> > >>> dja...@confluent.io> >> > >>> > > wrote: >> > >>> > > > > > >> > >>> > > > > > > Great KIP, thanks! +1 (non-binding) >> > >>> > > > > > > >> > >>> > > > > > > On Fri, Apr 17, 2020 at 8:56 PM Guozhang Wang < >> > >>> wangg...@gmail.com> >> > >>> > > > > wrote: >> > >>> > > > > > > >> > >>> > > > > > > > Thanks for the great KIP Kowshik, +1 (binding). >> > >>> > > > > > > > >> > >>> > > > > > > > On Fri, Apr 17, 2020 at 11:22 AM Jun Rao < >> j...@confluent.io >> > >>> > >> > >>> > > wrote: >> > >>> > > > > > > > >> > >>> > > > > > > > > Hi, Kowshik, >> > >>> > > > > > > > > >> > >>> > > > > > > > > Thanks for the KIP. +1 >> > >>> > > > > > > > > >> > >>> > > > > > > > > Jun >> > >>> > > > > > > > > >> > >>> > > > > > > > > On Thu, Apr 16, 2020 at 11:14 AM Kowshik Prakasam < >> > >>> > > > > > > > kpraka...@confluent.io> >> > >>> > > > > > > > > wrote: >> > >>> > > > > > > > > >> > >>> > > > > > > > > > Hi all, >> > >>> > > > > > > > > > >> > >>> > > > > > > > > > I'd like to start a vote for KIP-584 >> > >>> <https://issues.apache.org/jira/browse/KIP-584> >> > >>> > > > > <https://issues.apache.org/jira/browse/KIP-584>. The link >> to >> > >>> the KIP >> > >>> > > can >> > >>> > > > > be >> > >>> > > > > > > found >> > >>> > > > > > > > > > here: >> > >>> > > > > > > > > > >> > >>> > > > > > > > > > >> > >>> > > > > > > > > >> > >>> > > > > > > > >> > >>> > > > > > > >> > >>> > > > > >> > >>> > > >> > >>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features >> > >>> > > > > > > > > > . >> > >>> > > > > > > > > > >> > >>> > > > > > > > > > Thanks! >> > >>> > > > > > > > > > >> > >>> > > > > > > > > > >> > >>> > > > > > > > > > Cheers, >> > >>> > > > > > > > > > Kowshik >> > >>> > > > > > > > > > >> > >>> > > > > > > > > >> > >>> > > > > > > > >> > >>> > > > > > > > >> > >>> > > > > > > > -- >> > >>> > > > > > > > -- Guozhang >> > >>> > > > > > > > >> > >>> > > > > > > >> > >>> > > > > > >> > >>> > > > > >> > >>> > > > >> > >>> > > >> > >>> > >> > >>> >> > >> >> > >> >