Hi all, I wanted to let you know that I have made the following minor changes to the `kafka-features` CLI tool description in the KIP-584 write up. The purpose is to ensure the design is correct for a few things which came up during implementation:
1. The CLI tool now produces a tab-formatted output instead of JSON. This aligns with the type of format produced by other admin CLI tools of Kafka, ex: `kafka-topics`. 2. Whenever feature updates are performed, the output of the CLI tool shows the result of each feature update that was applied. 3. The CLI tool accepts an optional argument `--dry-run` which lets the user preview the feature updates before applying them. The following section of the KIP has been updated with the above changes: https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP584:Versioningschemeforfeatures-Toolingsupport Please let me know if you have any questions. Cheers, Kowshik On Thu, Oct 8, 2020 at 1:12 AM Kowshik Prakasam <kpraka...@confluent.io> wrote: > Hi Jun, > > This is a very good point. I have updated the feature version deprecation > section mentioning the same: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP584:Versioningschemeforfeatures-Featureversiondeprecation > . > > Thank you for the suggestion. > > > Cheers, > Kowshik > > > On Tue, Oct 6, 2020 at 5:30 PM Jun Rao <j...@confluent.io> wrote: > >> Hi, Kowshik, >> >> Thanks for the follow up. Both look good to me. >> >> For 2, it would be useful to also add that an admin should make sure that >> no clients are using a deprecated feature version (e.g. using the client >> version metric) before deploying a release that deprecates it. >> >> Thanks, >> >> Jun >> >> On Tue, Oct 6, 2020 at 3:46 PM Kowshik Prakasam <kpraka...@confluent.io> >> wrote: >> >> > Hi Jun, >> > >> > I have added the following details in the KIP-584 write up: >> > >> > 1. Deployment, IBP deprecation and avoidance of double rolls. This >> section >> > talks about the various phases of work that would be required to use >> this >> > KIP to eventually avoid Broker double rolls in the cluster (whenever IBP >> > values are advanced). Link to section: >> > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP584:Versioningschemeforfeatures-Deployment,IBPdeprecationandavoidanceofdoublerolls >> > . >> > >> > 2. Feature version deprecation. This section explains the idea for >> feature >> > version deprecation (using highest supported feature min version) which >> you >> > had proposed during code review: >> > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP584:Versioningschemeforfeatures-Featureversiondeprecation >> > . >> > >> > Please let me know if you have any questions. >> > >> > >> > Cheers, >> > Kowshik >> > >> > >> > On Tue, Sep 29, 2020 at 11:07 AM Jun Rao <j...@confluent.io> wrote: >> > >> > > Hi, Kowshik, >> > > >> > > Thanks for the update. Regarding enabling a single rolling restart in >> the >> > > future, could we sketch out a bit how this will work by treating IBP >> as a >> > > feature? For example, IBP currently uses the release version and this >> KIP >> > > uses an integer for versions. How do we bridge the gap between the >> two? >> > > Does min.version still make sense for IBP as a feature? >> > > >> > > Thanks, >> > > >> > > Jun >> > > >> > > On Fri, Sep 25, 2020 at 5:57 PM Kowshik Prakasam < >> kpraka...@confluent.io >> > > >> > > wrote: >> > > >> > > > Hi Colin, >> > > > >> > > > Thanks for the feedback. Those are very good points. I have made the >> > > > following changes to the KIP as you had suggested: >> > > > 1. Included the `timeoutMs` field in the `UpdateFeaturesRequest` >> > schema. >> > > > The initial implementation won't be making use of the field, but we >> can >> > > > always use it in the future as the need arises. >> > > > 2. Modified the `FinalizedFeaturesEpoch` field in >> `ApiVersionsResponse` >> > > to >> > > > use int64. This is to avoid overflow problems in the future once ZK >> is >> > > > gone. >> > > > >> > > > I have also incorporated these changes into the versioning write >> path >> > PR >> > > > that is currently under review: >> > > https://github.com/apache/kafka/pull/9001. >> > > > >> > > > >> > > > Cheers, >> > > > Kowshik >> > > > >> > > > >> > > > >> > > > On Fri, Sep 25, 2020 at 4:57 PM Kowshik Prakasam < >> > kpraka...@confluent.io >> > > > >> > > > wrote: >> > > > >> > > > > Hi Jun, >> > > > > >> > > > > Thanks for the feedback. It's a very good point. I have now >> modified >> > > the >> > > > > KIP-584 write-up "goals" section a bit. It now mentions one of the >> > > goals >> > > > as >> > > > > enabling rolling upgrades using a single restart (instead of 2). >> > Also I >> > > > > have removed the text explicitly aiming for deprecation of IBP. >> Note >> > > that >> > > > > previously under "Potential features in Kafka" the IBP was >> mentioned >> > > > under >> > > > > point (4) as a possible coarse-grained feature. Hopefully, now >> the 2 >> > > > > sections of the KIP align with each other well. >> > > > > >> > > > > >> > > > > Cheers, >> > > > > Kowshik >> > > > > >> > > > > >> > > > > On Fri, Sep 25, 2020 at 2:03 PM Colin McCabe <cmcc...@apache.org> >> > > wrote: >> > > > > >> > > > >> On Tue, Sep 22, 2020, at 00:43, Kowshik Prakasam wrote: >> > > > >> > Hi all, >> > > > >> > >> > > > >> > I wanted to let you know that I have made the following >> changes to >> > > the >> > > > >> > KIP-584 write up. The purpose is to ensure the design is >> correct >> > > for a >> > > > >> few >> > > > >> > things which came up during implementation: >> > > > >> > >> > > > >> >> > > > >> Hi Kowshik, >> > > > >> >> > > > >> Thanks for the updates. >> > > > >> >> > > > >> > >> > > > >> > 1. Per FeatureUpdate error code: The UPDATE_FEATURES controller >> > API >> > > is >> > > > >> no >> > > > >> > longer transactional. Going forward, we allow for individual >> > > > >> FeatureUpdate >> > > > >> > to succeed/fail in the request. As a result, the response >> schema >> > now >> > > > >> > contains an error code per FeatureUpdate as well as a top-level >> > > error >> > > > >> code. >> > > > >> > Overall this is a better design because it better represents >> the >> > > > nature >> > > > >> of >> > > > >> > the API: each FeatureUpdate in the request is independent of >> the >> > > other >> > > > >> > updates, and the controller can process/apply these >> independently >> > to >> > > > ZK. >> > > > >> > When an UPDATE_FEATURES request fails, this new design provides >> > > better >> > > > >> > clarity to the caller on which FeatureUpdate could not be >> applied >> > > (via >> > > > >> the >> > > > >> > individual error codes). In the previous design, we were >> unable to >> > > > >> achieve >> > > > >> > such an increased level of clarity in communicating the error >> > codes. >> > > > >> > >> > > > >> >> > > > >> OK >> > > > >> >> > > > >> > >> > > > >> > 2. Due to #1, there were some minor changes required to the >> > proposed >> > > > >> Admin >> > > > >> > APIs (describeFeatures and updateFeatures). A few unnecessary >> > public >> > > > >> APIs >> > > > >> > have been removed, and couple essential ones have been added. >> The >> > > > latest >> > > > >> > changes now represent the latest design. >> > > > >> > >> > > > >> > 3. The timeoutMs field has been removed from the the >> > UPDATE_FEATURES >> > > > API >> > > > >> > request, since it was not found to be required during >> > > implementation. >> > > > >> > >> > > > >> >> > > > >> Please don't get rid of timeoutMs. timeoutMs is required if you >> > want >> > > to >> > > > >> implement the ability to timeout the call if the controller can't >> > get >> > > > to it >> > > > >> in time. This is important for avoiding congestion collapse >> where >> > the >> > > > >> controller collapses under the weight of lots of retries of the >> same >> > > > set of >> > > > >> calls. >> > > > >> >> > > > >> We may not be able to do it in the initial implementation, but we >> > will >> > > > >> eventually implement this for all the controller-bound RPCs. >> > > > >> >> > > > >> > > >> > > > >> > > 2. Finalized feature version epoch data type has been made >> to be >> > > > int32 >> > > > >> > > (instead of int64). The reason is that the epoch value is the >> > > value >> > > > >> of ZK >> > > > >> > > node version, whose data type is int32. >> > > > >> > > >> > > > >> >> > > > >> Sorry, I missed this earlier. Using 16 bit feature levels seems >> > fine. >> > > > >> However, please don't use a 32-bit epoch here. We deliberately >> made >> > > the >> > > > >> epoch 64 bits to avoid overflow problems in the future once ZK is >> > > gone. >> > > > >> >> > > > >> best, >> > > > >> Colin >> > > > >> >> > > > >> > > 3. Introduced a new 'status' field in the '/features' ZK node >> > > > schema. >> > > > >> The >> > > > >> > > purpose is to implement Colin's earlier point for the >> strategy >> > for >> > > > >> > > transitioning from not having a /features znode to having >> one. >> > An >> > > > >> > > explanation has been provided in the following section of the >> > KIP >> > > > >> detailing >> > > > >> > > the different cases: >> > > > >> > > >> > > > >> >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP584:Versioningschemeforfeatures-FeatureZKnodestatus >> > > > >> > > . >> > > > >> > > >> > > > >> > > Please let me know if you have any questions or concerns. >> > > > >> > > >> > > > >> > > >> > > > >> > > Cheers, >> > > > >> > > Kowshik >> > > > >> > > >> > > > >> > > >> > > > >> > > >> > > > >> > > Cheers, >> > > > >> > > Kowshik >> > > > >> > > >> > > > >> > > On Tue, Apr 28, 2020 at 11:24 PM Kowshik Prakasam < >> > > > >> kpraka...@confluent.io> >> > > > >> > > wrote: >> > > > >> > > >> > > > >> > >> Hi all, >> > > > >> > >> >> > > > >> > >> This KIP vote has been open for ~12 days. The summary of the >> > > votes >> > > > is >> > > > >> > >> that we have 3 binding votes (Colin, Guozhang, Jun), and 3 >> > > > >> non-binding >> > > > >> > >> votes (David, Dhruvil, Boyang). Therefore, the KIP vote >> passes. >> > > > I'll >> > > > >> mark >> > > > >> > >> KIP as accepted and start working on the implementation. >> > > > >> > >> >> > > > >> > >> Thanks a lot! >> > > > >> > >> >> > > > >> > >> >> > > > >> > >> Cheers, >> > > > >> > >> Kowshik >> > > > >> > >> >> > > > >> > >> On Mon, Apr 27, 2020 at 12:15 PM Colin McCabe < >> > > cmcc...@apache.org> >> > > > >> wrote: >> > > > >> > >> >> > > > >> > >>> Thanks, Kowshik. +1 (binding) >> > > > >> > >>> >> > > > >> > >>> best, >> > > > >> > >>> Colin >> > > > >> > >>> >> > > > >> > >>> On Sat, Apr 25, 2020, at 13:20, Kowshik Prakasam wrote: >> > > > >> > >>> > Hi Colin, >> > > > >> > >>> > >> > > > >> > >>> > Thanks for the explanation! I agree with you, and I have >> > > updated >> > > > >> the >> > > > >> > >>> > KIP. >> > > > >> > >>> > Here is a link to relevant section: >> > > > >> > >>> > >> > > > >> > >>> >> > > > >> >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Controller:ZKnodebootstrapwithdefaultvalues >> > > > >> > >>> > >> > > > >> > >>> > >> > > > >> > >>> > Cheers, >> > > > >> > >>> > Kowshik >> > > > >> > >>> > >> > > > >> > >>> > On Fri, Apr 24, 2020 at 8:50 PM Colin McCabe < >> > > > cmcc...@apache.org> >> > > > >> > >>> wrote: >> > > > >> > >>> > >> > > > >> > >>> > > On Fri, Apr 24, 2020, at 00:01, Kowshik Prakasam wrote: >> > > > >> > >>> > > > (Kowshik): Great point! However for case #1, I'm not >> > sure >> > > > why >> > > > >> we >> > > > >> > >>> need to >> > > > >> > >>> > > > create a '/features' ZK node with disabled features. >> > > > Instead, >> > > > >> do >> > > > >> > >>> you see >> > > > >> > >>> > > > any drawback if we just do not create it? i.e. if >> IBP is >> > > > less >> > > > >> than >> > > > >> > >>> 2.6, >> > > > >> > >>> > > the >> > > > >> > >>> > > > controller treats the case as though the versioning >> > system >> > > > is >> > > > >> > >>> completely >> > > > >> > >>> > > > disabled, and would not create a non-existing >> > '/features' >> > > > >> node. >> > > > >> > >>> > > >> > > > >> > >>> > > Hi Kowshik, >> > > > >> > >>> > > >> > > > >> > >>> > > When the IBP is less than 2.6, but the software has >> been >> > > > >> upgraded to >> > > > >> > >>> a >> > > > >> > >>> > > state where it supports this KIP, that >> > > > >> > >>> > > means the user is upgrading from an earlier version of >> > the >> > > > >> > >>> software. In >> > > > >> > >>> > > this case, we want to start with all the features >> disabled >> > > and >> > > > >> allow >> > > > >> > >>> the >> > > > >> > >>> > > user to enable them when they are ready. >> > > > >> > >>> > > >> > > > >> > >>> > > Enabling all the possible features immediately after an >> > > > upgrade >> > > > >> > >>> could be >> > > > >> > >>> > > harmful to the cluster. On the other hand, for a new >> > > cluster, >> > > > >> we do >> > > > >> > >>> want >> > > > >> > >>> > > to enable all the possible features immediately . I was >> > > > >> proposing >> > > > >> > >>> this as a >> > > > >> > >>> > > way to distinguish the two cases (since the new cluster >> > will >> > > > >> never be >> > > > >> > >>> > > started with an old IBP). >> > > > >> > >>> > > >> > > > >> > >>> > > > Colin MccCabe wrote: >> > > > >> > >>> > > > > And now, something a little bit bigger (sorry). >> For >> > > > >> finalized >> > > > >> > >>> > > features, >> > > > >> > >>> > > > > why do we need both min_version_level and >> > > > max_version_level? >> > > > >> > >>> Assuming >> > > > >> > >>> > > that >> > > > >> > >>> > > > > we want all the brokers to be on the same feature >> > > version >> > > > >> level, >> > > > >> > >>> we >> > > > >> > >>> > > really only care >> > > > >> > >>> > > > > about three numbers for each feature, right? The >> > > minimum >> > > > >> > >>> supported >> > > > >> > >>> > > version >> > > > >> > >>> > > > > level, the maximum supported version level, and the >> > > > current >> > > > >> > >>> active >> > > > >> > >>> > > version level. >> > > > >> > >>> > > > >> > > > >> > >>> > > > > We don't actually want different brokers to be on >> > > > different >> > > > >> > >>> versions of >> > > > >> > >>> > > > > the same feature, right? So we can just have one >> > number >> > > > for >> > > > >> > >>> current >> > > > >> > >>> > > > > version level, rather than two. At least that's >> what >> > I >> > > > was >> > > > >> > >>> thinking >> > > > >> > >>> > > -- let >> > > > >> > >>> > > > > me know if I missed something. >> > > > >> > >>> > > > >> > > > >> > >>> > > > (Kowshik): It is my understanding that the "current >> > active >> > > > >> version >> > > > >> > >>> level" >> > > > >> > >>> > > > that you have mentioned, is the "max_version_level". >> But >> > > we >> > > > >> still >> > > > >> > >>> > > > maintain/publish both min and max version levels, >> > because, >> > > > the >> > > > >> > >>> detail >> > > > >> > >>> > > about >> > > > >> > >>> > > > min level is useful to external clients. This is >> > described >> > > > >> below. >> > > > >> > >>> > > > >> > > > >> > >>> > > > For any feature F, think of the closed range: >> > > > >> [min_version_level, >> > > > >> > >>> > > > max_version_level] as the range of finalized >> versions, >> > > > that's >> > > > >> > >>> guaranteed >> > > > >> > >>> > > to >> > > > >> > >>> > > > be supported by all brokers in the cluster. >> > > > >> > >>> > > > - "max_version_level" is the finalized highest >> common >> > > > version >> > > > >> > >>> among all >> > > > >> > >>> > > > brokers, >> > > > >> > >>> > > > - "min_version_level" is the finalized lowest common >> > > > version >> > > > >> > >>> among all >> > > > >> > >>> > > > brokers. >> > > > >> > >>> > > > >> > > > >> > >>> > > > Next, think of "client" here as the "user of the new >> > > feature >> > > > >> > >>> versions >> > > > >> > >>> > > > system". Imagine that such a client learns about >> > finalized >> > > > >> feature >> > > > >> > >>> > > > versions, and exercises some logic based on the >> version. >> > > > These >> > > > >> > >>> clients >> > > > >> > >>> > > can >> > > > >> > >>> > > > be of 2 types: >> > > > >> > >>> > > > 1. Some part of the broker code itself could behave >> > like a >> > > > >> client >> > > > >> > >>> trying >> > > > >> > >>> > > to >> > > > >> > >>> > > > use some feature that's "internal" to the broker >> > cluster. >> > > > >> Such a >> > > > >> > >>> client >> > > > >> > >>> > > > would learn the latest finalized features via ZK. >> > > > >> > >>> > > > 2. An external system (ex: Streams) could behave >> like a >> > > > >> client, >> > > > >> > >>> trying to >> > > > >> > >>> > > > use some "external" facing feature. Such a client >> would >> > > > learn >> > > > >> > >>> latest >> > > > >> > >>> > > > finalized features via ApiVersionsRequest. Ex: >> > > > >> group_coordinator >> > > > >> > >>> feature >> > > > >> > >>> > > > described in the KIP. >> > > > >> > >>> > > > >> > > > >> > >>> > > > Next, imagine that for F, the max_version_level is >> > > > >> successfully >> > > > >> > >>> bumped by >> > > > >> > >>> > > > +1 (via Controller API). Now it is guaranteed that >> all >> > > > brokers >> > > > >> > >>> (i.e. >> > > > >> > >>> > > > internal clients) understand max_version_level + 1. >> > > However, >> > > > >> it is >> > > > >> > >>> still >> > > > >> > >>> > > > not guaranteed that all external clients have support >> > for >> > > > (or >> > > > >> have >> > > > >> > >>> > > > activated) the logic for the newer version. Why? >> > Because, >> > > > >> this is >> > > > >> > >>> > > > subjective as explained next: >> > > > >> > >>> > > > >> > > > >> > >>> > > > 1. On one hand, imagine F as an internal feature only >> > > > >> relevant to >> > > > >> > >>> > > Brokers. >> > > > >> > >>> > > > The binary for the internal client logic is >> controlled >> > by >> > > > >> Broker >> > > > >> > >>> cluster >> > > > >> > >>> > > > deployments. When shipping a new Broker release, we >> > > wouldn't >> > > > >> bump >> > > > >> > >>> max >> > > > >> > >>> > > > "supported" feature version for F by 1, unless we >> have >> > > > >> introduced >> > > > >> > >>> some >> > > > >> > >>> > > new >> > > > >> > >>> > > > logic (with a potentially breaking change) in the >> > Broker. >> > > > >> > >>> Furthermore, >> > > > >> > >>> > > such >> > > > >> > >>> > > > feature logic in the broker should/will not be >> > implemented >> > > > in >> > > > >> a >> > > > >> > >>> way that >> > > > >> > >>> > > it >> > > > >> > >>> > > > would activate logic for an older feature version >> after >> > it >> > > > has >> > > > >> > >>> migrated >> > > > >> > >>> > > to >> > > > >> > >>> > > > using the logic for a newer feature version (because >> > this >> > > > >> could >> > > > >> > >>> break the >> > > > >> > >>> > > > cluster!). For these cases, max_version_level will be >> > very >> > > > >> useful >> > > > >> > >>> for >> > > > >> > >>> > > > decision making. >> > > > >> > >>> > > > >> > > > >> > >>> > > > 2. On the other hand, imagine F as an external facing >> > > > feature. >> > > > >> > >>> External >> > > > >> > >>> > > > clients are not within the control of Broker >> cluster. An >> > > > >> external >> > > > >> > >>> client >> > > > >> > >>> > > > may not have upgraded it's code (yet) to use >> > > > >> 'max_version_level + >> > > > >> > >>> 1'. >> > > > >> > >>> > > But, >> > > > >> > >>> > > > the Kafka cluster could have been deployed with >> support >> > > for >> > > > >> > >>> > > > 'max_version_level + 1' of an external facing >> feature. >> > > Now, >> > > > >> these >> > > > >> > >>> > > external >> > > > >> > >>> > > > clients (until an upgrade) are benefitted in learning >> > > "what >> > > > >> is the >> > > > >> > >>> lowest >> > > > >> > >>> > > > common version for F among all brokers?". This is >> where >> > > the >> > > > >> > >>> > > > "min_version_level" becomes useful. Using this, a >> client >> > > > could >> > > > >> > >>> learn the >> > > > >> > >>> > > > specific supported versions that's lower than >> > > > >> max_version_level >> > > > >> > >>> (instead >> > > > >> > >>> > > of >> > > > >> > >>> > > > assuming that all brokers support the range: [1, >> > > > >> > >>> max_version_level]). For >> > > > >> > >>> > > > example, if the cluster deprecates >> "min_version_level", >> > > then >> > > > >> the >> > > > >> > >>> client >> > > > >> > >>> > > > becomes aware because it periodically learns the >> latest >> > > > >> > >>> > > "min_version_level" >> > > > >> > >>> > > > via ApiVersionsRequest. >> > > > >> > >>> > > > >> > > > >> > >>> > > >> > > > >> > >>> > > Thanks for the explanation. I agree that this does >> make >> > > sense >> > > > >> when >> > > > >> > >>> you >> > > > >> > >>> > > take the client perspective into account. >> > > > >> > >>> > > >> > > > >> > >>> > > best, >> > > > >> > >>> > > Colin >> > > > >> > >>> > > >> > > > >> > >>> > > >> > > > >> > >>> > > > >> > > > >> > >>> > > > Cheers, >> > > > >> > >>> > > > Kowshik >> > > > >> > >>> > > > >> > > > >> > >>> > > > >> > > > >> > >>> > > > >> > > > >> > >>> > > > >> > > > >> > >>> > > > On Thu, Apr 23, 2020 at 12:07 PM Colin McCabe < >> > > > >> cmcc...@apache.org> >> > > > >> > >>> > > wrote: >> > > > >> > >>> > > > >> > > > >> > >>> > > > > Hi Kowshik, >> > > > >> > >>> > > > > >> > > > >> > >>> > > > > Thanks again for working on this-- it looks >> great. I >> > > went >> > > > >> over >> > > > >> > >>> the KIP >> > > > >> > >>> > > > > again and have a few more comments. >> > > > >> > >>> > > > > >> > > > >> > >>> > > > > === >> > > > >> > >>> > > > > >> > > > >> > >>> > > > > It would be good to note that deprecating a feature >> > > > version >> > > > >> (in >> > > > >> > >>> other >> > > > >> > >>> > > > > words, increasing minVersionLevel on the broker) >> is an >> > > > >> > >>> incompatible >> > > > >> > >>> > > change, >> > > > >> > >>> > > > > which requires a major release of Kafka. >> > > > >> > >>> > > > > >> > > > >> > >>> > > > > === >> > > > >> > >>> > > > > >> > > > >> > >>> > > > > I think the strategy for transitioning from not >> > having a >> > > > >> > >>> /features >> > > > >> > >>> > > znode >> > > > >> > >>> > > > > to having one could use some work. The current >> > proposal >> > > is >> > > > >> to >> > > > >> > >>> wait for >> > > > >> > >>> > > all >> > > > >> > >>> > > > > the brokers to fill in their feature znodes and >> then >> > > pick >> > > > >> the >> > > > >> > >>> highest >> > > > >> > >>> > > > > common versions. But that requires blocking in the >> > > > >> controller >> > > > >> > >>> startup >> > > > >> > >>> > > code >> > > > >> > >>> > > > > until the whole cluster is active (technically, a >> > point >> > > in >> > > > >> time >> > > > >> > >>> which >> > > > >> > >>> > > we >> > > > >> > >>> > > > > never really know that we have reached...) >> > > > >> > >>> > > > > >> > > > >> > >>> > > > > Instead, I think it would be better to have a >> strategy >> > > > like >> > > > >> this: >> > > > >> > >>> > > > > 1. If the controller comes up and there is no >> > /features >> > > > >> znode >> > > > >> > >>> AND the >> > > > >> > >>> > > IBP >> > > > >> > >>> > > > > is less than 2.6, create a /features znode where >> all >> > the >> > > > >> > >>> features are >> > > > >> > >>> > > > > disabled. >> > > > >> > >>> > > > > 2. If the controller comes up and there is no >> > /features >> > > > >> znode >> > > > >> > >>> AND the >> > > > >> > >>> > > IBP >> > > > >> > >>> > > > > is greater than or equal to 2.6, create a /features >> > > znode >> > > > >> where >> > > > >> > >>> all the >> > > > >> > >>> > > > > features are enabled at the highest versions >> supported >> > > by >> > > > >> the >> > > > >> > >>> > > controller. >> > > > >> > >>> > > > > >> > > > >> > >>> > > > > People upgrading from the pre-KIP-584 >> > > > >> > >>> <https://issues.apache.org/jira/browse/KIP-584> >> > > > >> > >>> > > > > <https://issues.apache.org/jira/browse/KIP-584> >> world >> > > > will >> > > > >> end >> > > > >> > >>> up in >> > > > >> > >>> > > case >> > > > >> > >>> > > > > #1 because they have to do a double roll to >> upgrade, >> > and >> > > > >> during >> > > > >> > >>> the >> > > > >> > >>> > > first >> > > > >> > >>> > > > > roll, the IBP is unchanged. People creating new >> > > clusters >> > > > >> from >> > > > >> > >>> scratch >> > > > >> > >>> > > will >> > > > >> > >>> > > > > end up in case #2, which is what we want since we >> > don't >> > > > >> want a >> > > > >> > >>> brand >> > > > >> > >>> > > new >> > > > >> > >>> > > > > cluster to be using old feature flag versions. >> > > > >> > >>> > > > > >> > > > >> > >>> > > > > === >> > > > >> > >>> > > > > >> > > > >> > >>> > > > > UpdateFeaturesResponse#ErrorMessage should specify >> > > > >> > >>> nullableVersions >> > > > >> > >>> > > since >> > > > >> > >>> > > > > null is a valid value here >> > > > >> > >>> > > > > >> > > > >> > >>> > > > > Also, in the message format, the tags we use need >> to >> > > start >> > > > >> at 0. >> > > > >> > >>> > > > > >> > > > >> > >>> > > > > === >> > > > >> > >>> > > > > >> > > > >> > >>> > > > > I don't think we need the >> FEATURE_UPDATE_IN_PROGRESS >> > > error >> > > > >> > >>> code. The >> > > > >> > >>> > > > > controller is basically single-threaded and will >> only >> > do >> > > > >> one of >> > > > >> > >>> these >> > > > >> > >>> > > > > operations at once. Even if it weren't, though, we >> > > could >> > > > >> simply >> > > > >> > >>> block >> > > > >> > >>> > > the >> > > > >> > >>> > > > > second operation behind the first one. >> > > > >> > >>> > > > > >> > > > >> > >>> > > > > === >> > > > >> > >>> > > > > >> > > > >> > >>> > > > > For updateFeatures, it would be good to specify >> that >> > if >> > > a >> > > > >> single >> > > > >> > >>> > > feature >> > > > >> > >>> > > > > version update in the batch can't be done, none of >> > them >> > > > are >> > > > >> > >>> done. I >> > > > >> > >>> > > think >> > > > >> > >>> > > > > this was the intention, but I wasn't able to find >> it >> > > > >> spelled out >> > > > >> > >>> > > (maybe i >> > > > >> > >>> > > > > missed it). >> > > > >> > >>> > > > > >> > > > >> > >>> > > > > === >> > > > >> > >>> > > > > >> > > > >> > >>> > > > > And now, something a little bit bigger (sorry). >> For >> > > > >> finalized >> > > > >> > >>> > > features, >> > > > >> > >>> > > > > why do we need both min_version_level and >> > > > max_version_level? >> > > > >> > >>> Assuming >> > > > >> > >>> > > that >> > > > >> > >>> > > > > we want all the brokers to be on the same feature >> > > version >> > > > >> level, >> > > > >> > >>> we >> > > > >> > >>> > > really >> > > > >> > >>> > > > > only care about three numbers for each feature, >> right? >> > > > The >> > > > >> > >>> minimum >> > > > >> > >>> > > > > supported version level, the maximum supported >> version >> > > > >> level, >> > > > >> > >>> and the >> > > > >> > >>> > > > > current active version level. >> > > > >> > >>> > > > > >> > > > >> > >>> > > > > We don't actually want different brokers to be on >> > > > different >> > > > >> > >>> versions of >> > > > >> > >>> > > > > the same feature, right? So we can just have one >> > number >> > > > for >> > > > >> > >>> current >> > > > >> > >>> > > > > version level, rather than two. At least that's >> what >> > I >> > > > was >> > > > >> > >>> thinking >> > > > >> > >>> > > -- let >> > > > >> > >>> > > > > me know if I missed something. >> > > > >> > >>> > > > > >> > > > >> > >>> > > > > best, >> > > > >> > >>> > > > > Colin >> > > > >> > >>> > > > > >> > > > >> > >>> > > > > >> > > > >> > >>> > > > > On Tue, Apr 21, 2020, at 13:01, Dhruvil Shah wrote: >> > > > >> > >>> > > > > > Thanks for the KIP! +1 (non-binding) >> > > > >> > >>> > > > > > >> > > > >> > >>> > > > > > On Tue, Apr 21, 2020 at 6:09 AM David Jacot < >> > > > >> > >>> dja...@confluent.io> >> > > > >> > >>> > > wrote: >> > > > >> > >>> > > > > > >> > > > >> > >>> > > > > > > Great KIP, thanks! +1 (non-binding) >> > > > >> > >>> > > > > > > >> > > > >> > >>> > > > > > > On Fri, Apr 17, 2020 at 8:56 PM Guozhang Wang < >> > > > >> > >>> wangg...@gmail.com> >> > > > >> > >>> > > > > wrote: >> > > > >> > >>> > > > > > > >> > > > >> > >>> > > > > > > > Thanks for the great KIP Kowshik, +1 >> (binding). >> > > > >> > >>> > > > > > > > >> > > > >> > >>> > > > > > > > On Fri, Apr 17, 2020 at 11:22 AM Jun Rao < >> > > > >> j...@confluent.io >> > > > >> > >>> > >> > > > >> > >>> > > wrote: >> > > > >> > >>> > > > > > > > >> > > > >> > >>> > > > > > > > > Hi, Kowshik, >> > > > >> > >>> > > > > > > > > >> > > > >> > >>> > > > > > > > > Thanks for the KIP. +1 >> > > > >> > >>> > > > > > > > > >> > > > >> > >>> > > > > > > > > Jun >> > > > >> > >>> > > > > > > > > >> > > > >> > >>> > > > > > > > > On Thu, Apr 16, 2020 at 11:14 AM Kowshik >> > > Prakasam >> > > > < >> > > > >> > >>> > > > > > > > kpraka...@confluent.io> >> > > > >> > >>> > > > > > > > > wrote: >> > > > >> > >>> > > > > > > > > >> > > > >> > >>> > > > > > > > > > Hi all, >> > > > >> > >>> > > > > > > > > > >> > > > >> > >>> > > > > > > > > > I'd like to start a vote for KIP-584 >> > > > >> > >>> <https://issues.apache.org/jira/browse/KIP-584> >> > > > >> > >>> > > > > <https://issues.apache.org/jira/browse/KIP-584>. >> The >> > > link >> > > > >> to >> > > > >> > >>> the KIP >> > > > >> > >>> > > can >> > > > >> > >>> > > > > be >> > > > >> > >>> > > > > > > found >> > > > >> > >>> > > > > > > > > > here: >> > > > >> > >>> > > > > > > > > > >> > > > >> > >>> > > > > > > > > > >> > > > >> > >>> > > > > > > > > >> > > > >> > >>> > > > > > > > >> > > > >> > >>> > > > > > > >> > > > >> > >>> > > > > >> > > > >> > >>> > > >> > > > >> > >>> >> > > > >> >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features >> > > > >> > >>> > > > > > > > > > . >> > > > >> > >>> > > > > > > > > > >> > > > >> > >>> > > > > > > > > > Thanks! >> > > > >> > >>> > > > > > > > > > >> > > > >> > >>> > > > > > > > > > >> > > > >> > >>> > > > > > > > > > Cheers, >> > > > >> > >>> > > > > > > > > > Kowshik >> > > > >> > >>> > > > > > > > > > >> > > > >> > >>> > > > > > > > > >> > > > >> > >>> > > > > > > > >> > > > >> > >>> > > > > > > > >> > > > >> > >>> > > > > > > > -- >> > > > >> > >>> > > > > > > > -- Guozhang >> > > > >> > >>> > > > > > > > >> > > > >> > >>> > > > > > > >> > > > >> > >>> > > > > > >> > > > >> > >>> > > > > >> > > > >> > >>> > > > >> > > > >> > >>> > > >> > > > >> > >>> > >> > > > >> > >>> >> > > > >> > >> >> > > > >> > >> > > > >> >> > > > > >> > > > >> > > >> > >> >