Hi, Kowshik, Thanks for the update. Those changes look good to me.
Jun On Tue, Oct 13, 2020 at 4:50 PM Kowshik Prakasam <kpraka...@confluent.io> wrote: > Hi all, > > I wanted to let you know that I have made the following minor changes to > the `kafka-features` CLI tool description in the KIP-584 write up. The > purpose is to ensure the design is correct for a few things which came up > during implementation: > > 1. The CLI tool now produces a tab-formatted output instead of JSON. This > aligns with the type of format produced by other admin CLI tools of Kafka, > ex: `kafka-topics`. > 2. Whenever feature updates are performed, the output of the CLI tool shows > the result of each feature update that was applied. > 3. The CLI tool accepts an optional argument `--dry-run` which lets the > user preview the feature updates before applying them. > > The following section of the KIP has been updated with the above changes: > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP584:Versioningschemeforfeatures-Toolingsupport > > Please let me know if you have any questions. > > > Cheers, > Kowshik > > > On Thu, Oct 8, 2020 at 1:12 AM Kowshik Prakasam <kpraka...@confluent.io> > wrote: > > > Hi Jun, > > > > This is a very good point. I have updated the feature version deprecation > > section mentioning the same: > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP584:Versioningschemeforfeatures-Featureversiondeprecation > > . > > > > Thank you for the suggestion. > > > > > > Cheers, > > Kowshik > > > > > > On Tue, Oct 6, 2020 at 5:30 PM Jun Rao <j...@confluent.io> wrote: > > > >> Hi, Kowshik, > >> > >> Thanks for the follow up. Both look good to me. > >> > >> For 2, it would be useful to also add that an admin should make sure > that > >> no clients are using a deprecated feature version (e.g. using the client > >> version metric) before deploying a release that deprecates it. > >> > >> Thanks, > >> > >> Jun > >> > >> On Tue, Oct 6, 2020 at 3:46 PM Kowshik Prakasam <kpraka...@confluent.io > > > >> wrote: > >> > >> > Hi Jun, > >> > > >> > I have added the following details in the KIP-584 write up: > >> > > >> > 1. Deployment, IBP deprecation and avoidance of double rolls. This > >> section > >> > talks about the various phases of work that would be required to use > >> this > >> > KIP to eventually avoid Broker double rolls in the cluster (whenever > IBP > >> > values are advanced). Link to section: > >> > > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP584:Versioningschemeforfeatures-Deployment,IBPdeprecationandavoidanceofdoublerolls > >> > . > >> > > >> > 2. Feature version deprecation. This section explains the idea for > >> feature > >> > version deprecation (using highest supported feature min version) > which > >> you > >> > had proposed during code review: > >> > > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP584:Versioningschemeforfeatures-Featureversiondeprecation > >> > . > >> > > >> > Please let me know if you have any questions. > >> > > >> > > >> > Cheers, > >> > Kowshik > >> > > >> > > >> > On Tue, Sep 29, 2020 at 11:07 AM Jun Rao <j...@confluent.io> wrote: > >> > > >> > > Hi, Kowshik, > >> > > > >> > > Thanks for the update. Regarding enabling a single rolling restart > in > >> the > >> > > future, could we sketch out a bit how this will work by treating IBP > >> as a > >> > > feature? For example, IBP currently uses the release version and > this > >> KIP > >> > > uses an integer for versions. How do we bridge the gap between the > >> two? > >> > > Does min.version still make sense for IBP as a feature? > >> > > > >> > > Thanks, > >> > > > >> > > Jun > >> > > > >> > > On Fri, Sep 25, 2020 at 5:57 PM Kowshik Prakasam < > >> kpraka...@confluent.io > >> > > > >> > > wrote: > >> > > > >> > > > Hi Colin, > >> > > > > >> > > > Thanks for the feedback. Those are very good points. I have made > the > >> > > > following changes to the KIP as you had suggested: > >> > > > 1. Included the `timeoutMs` field in the `UpdateFeaturesRequest` > >> > schema. > >> > > > The initial implementation won't be making use of the field, but > we > >> can > >> > > > always use it in the future as the need arises. > >> > > > 2. Modified the `FinalizedFeaturesEpoch` field in > >> `ApiVersionsResponse` > >> > > to > >> > > > use int64. This is to avoid overflow problems in the future once > ZK > >> is > >> > > > gone. > >> > > > > >> > > > I have also incorporated these changes into the versioning write > >> path > >> > PR > >> > > > that is currently under review: > >> > > https://github.com/apache/kafka/pull/9001. > >> > > > > >> > > > > >> > > > Cheers, > >> > > > Kowshik > >> > > > > >> > > > > >> > > > > >> > > > On Fri, Sep 25, 2020 at 4:57 PM Kowshik Prakasam < > >> > kpraka...@confluent.io > >> > > > > >> > > > wrote: > >> > > > > >> > > > > Hi Jun, > >> > > > > > >> > > > > Thanks for the feedback. It's a very good point. I have now > >> modified > >> > > the > >> > > > > KIP-584 write-up "goals" section a bit. It now mentions one of > the > >> > > goals > >> > > > as > >> > > > > enabling rolling upgrades using a single restart (instead of 2). > >> > Also I > >> > > > > have removed the text explicitly aiming for deprecation of IBP. > >> Note > >> > > that > >> > > > > previously under "Potential features in Kafka" the IBP was > >> mentioned > >> > > > under > >> > > > > point (4) as a possible coarse-grained feature. Hopefully, now > >> the 2 > >> > > > > sections of the KIP align with each other well. > >> > > > > > >> > > > > > >> > > > > Cheers, > >> > > > > Kowshik > >> > > > > > >> > > > > > >> > > > > On Fri, Sep 25, 2020 at 2:03 PM Colin McCabe < > cmcc...@apache.org> > >> > > wrote: > >> > > > > > >> > > > >> On Tue, Sep 22, 2020, at 00:43, Kowshik Prakasam wrote: > >> > > > >> > Hi all, > >> > > > >> > > >> > > > >> > I wanted to let you know that I have made the following > >> changes to > >> > > the > >> > > > >> > KIP-584 write up. The purpose is to ensure the design is > >> correct > >> > > for a > >> > > > >> few > >> > > > >> > things which came up during implementation: > >> > > > >> > > >> > > > >> > >> > > > >> Hi Kowshik, > >> > > > >> > >> > > > >> Thanks for the updates. > >> > > > >> > >> > > > >> > > >> > > > >> > 1. Per FeatureUpdate error code: The UPDATE_FEATURES > controller > >> > API > >> > > is > >> > > > >> no > >> > > > >> > longer transactional. Going forward, we allow for individual > >> > > > >> FeatureUpdate > >> > > > >> > to succeed/fail in the request. As a result, the response > >> schema > >> > now > >> > > > >> > contains an error code per FeatureUpdate as well as a > top-level > >> > > error > >> > > > >> code. > >> > > > >> > Overall this is a better design because it better represents > >> the > >> > > > nature > >> > > > >> of > >> > > > >> > the API: each FeatureUpdate in the request is independent of > >> the > >> > > other > >> > > > >> > updates, and the controller can process/apply these > >> independently > >> > to > >> > > > ZK. > >> > > > >> > When an UPDATE_FEATURES request fails, this new design > provides > >> > > better > >> > > > >> > clarity to the caller on which FeatureUpdate could not be > >> applied > >> > > (via > >> > > > >> the > >> > > > >> > individual error codes). In the previous design, we were > >> unable to > >> > > > >> achieve > >> > > > >> > such an increased level of clarity in communicating the error > >> > codes. > >> > > > >> > > >> > > > >> > >> > > > >> OK > >> > > > >> > >> > > > >> > > >> > > > >> > 2. Due to #1, there were some minor changes required to the > >> > proposed > >> > > > >> Admin > >> > > > >> > APIs (describeFeatures and updateFeatures). A few unnecessary > >> > public > >> > > > >> APIs > >> > > > >> > have been removed, and couple essential ones have been added. > >> The > >> > > > latest > >> > > > >> > changes now represent the latest design. > >> > > > >> > > >> > > > >> > 3. The timeoutMs field has been removed from the the > >> > UPDATE_FEATURES > >> > > > API > >> > > > >> > request, since it was not found to be required during > >> > > implementation. > >> > > > >> > > >> > > > >> > >> > > > >> Please don't get rid of timeoutMs. timeoutMs is required if > you > >> > want > >> > > to > >> > > > >> implement the ability to timeout the call if the controller > can't > >> > get > >> > > > to it > >> > > > >> in time. This is important for avoiding congestion collapse > >> where > >> > the > >> > > > >> controller collapses under the weight of lots of retries of the > >> same > >> > > > set of > >> > > > >> calls. > >> > > > >> > >> > > > >> We may not be able to do it in the initial implementation, but > we > >> > will > >> > > > >> eventually implement this for all the controller-bound RPCs. > >> > > > >> > >> > > > >> > > > >> > > > >> > > 2. Finalized feature version epoch data type has been made > >> to be > >> > > > int32 > >> > > > >> > > (instead of int64). The reason is that the epoch value is > the > >> > > value > >> > > > >> of ZK > >> > > > >> > > node version, whose data type is int32. > >> > > > >> > > > >> > > > >> > >> > > > >> Sorry, I missed this earlier. Using 16 bit feature levels > seems > >> > fine. > >> > > > >> However, please don't use a 32-bit epoch here. We deliberately > >> made > >> > > the > >> > > > >> epoch 64 bits to avoid overflow problems in the future once ZK > is > >> > > gone. > >> > > > >> > >> > > > >> best, > >> > > > >> Colin > >> > > > >> > >> > > > >> > > 3. Introduced a new 'status' field in the '/features' ZK > node > >> > > > schema. > >> > > > >> The > >> > > > >> > > purpose is to implement Colin's earlier point for the > >> strategy > >> > for > >> > > > >> > > transitioning from not having a /features znode to having > >> one. > >> > An > >> > > > >> > > explanation has been provided in the following section of > the > >> > KIP > >> > > > >> detailing > >> > > > >> > > the different cases: > >> > > > >> > > > >> > > > >> > >> > > > > >> > > > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP584:Versioningschemeforfeatures-FeatureZKnodestatus > >> > > > >> > > . > >> > > > >> > > > >> > > > >> > > Please let me know if you have any questions or concerns. > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > Cheers, > >> > > > >> > > Kowshik > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > Cheers, > >> > > > >> > > Kowshik > >> > > > >> > > > >> > > > >> > > On Tue, Apr 28, 2020 at 11:24 PM Kowshik Prakasam < > >> > > > >> kpraka...@confluent.io> > >> > > > >> > > wrote: > >> > > > >> > > > >> > > > >> > >> Hi all, > >> > > > >> > >> > >> > > > >> > >> This KIP vote has been open for ~12 days. The summary of > the > >> > > votes > >> > > > is > >> > > > >> > >> that we have 3 binding votes (Colin, Guozhang, Jun), and 3 > >> > > > >> non-binding > >> > > > >> > >> votes (David, Dhruvil, Boyang). Therefore, the KIP vote > >> passes. > >> > > > I'll > >> > > > >> mark > >> > > > >> > >> KIP as accepted and start working on the implementation. > >> > > > >> > >> > >> > > > >> > >> Thanks a lot! > >> > > > >> > >> > >> > > > >> > >> > >> > > > >> > >> Cheers, > >> > > > >> > >> Kowshik > >> > > > >> > >> > >> > > > >> > >> On Mon, Apr 27, 2020 at 12:15 PM Colin McCabe < > >> > > cmcc...@apache.org> > >> > > > >> wrote: > >> > > > >> > >> > >> > > > >> > >>> Thanks, Kowshik. +1 (binding) > >> > > > >> > >>> > >> > > > >> > >>> best, > >> > > > >> > >>> Colin > >> > > > >> > >>> > >> > > > >> > >>> On Sat, Apr 25, 2020, at 13:20, Kowshik Prakasam wrote: > >> > > > >> > >>> > Hi Colin, > >> > > > >> > >>> > > >> > > > >> > >>> > Thanks for the explanation! I agree with you, and I > have > >> > > updated > >> > > > >> the > >> > > > >> > >>> > KIP. > >> > > > >> > >>> > Here is a link to relevant section: > >> > > > >> > >>> > > >> > > > >> > >>> > >> > > > >> > >> > > > > >> > > > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Controller:ZKnodebootstrapwithdefaultvalues > >> > > > >> > >>> > > >> > > > >> > >>> > > >> > > > >> > >>> > Cheers, > >> > > > >> > >>> > Kowshik > >> > > > >> > >>> > > >> > > > >> > >>> > On Fri, Apr 24, 2020 at 8:50 PM Colin McCabe < > >> > > > cmcc...@apache.org> > >> > > > >> > >>> wrote: > >> > > > >> > >>> > > >> > > > >> > >>> > > On Fri, Apr 24, 2020, at 00:01, Kowshik Prakasam > wrote: > >> > > > >> > >>> > > > (Kowshik): Great point! However for case #1, I'm > not > >> > sure > >> > > > why > >> > > > >> we > >> > > > >> > >>> need to > >> > > > >> > >>> > > > create a '/features' ZK node with disabled > features. > >> > > > Instead, > >> > > > >> do > >> > > > >> > >>> you see > >> > > > >> > >>> > > > any drawback if we just do not create it? i.e. if > >> IBP is > >> > > > less > >> > > > >> than > >> > > > >> > >>> 2.6, > >> > > > >> > >>> > > the > >> > > > >> > >>> > > > controller treats the case as though the versioning > >> > system > >> > > > is > >> > > > >> > >>> completely > >> > > > >> > >>> > > > disabled, and would not create a non-existing > >> > '/features' > >> > > > >> node. > >> > > > >> > >>> > > > >> > > > >> > >>> > > Hi Kowshik, > >> > > > >> > >>> > > > >> > > > >> > >>> > > When the IBP is less than 2.6, but the software has > >> been > >> > > > >> upgraded to > >> > > > >> > >>> a > >> > > > >> > >>> > > state where it supports this KIP, that > >> > > > >> > >>> > > means the user is upgrading from an earlier version > of > >> > the > >> > > > >> > >>> software. In > >> > > > >> > >>> > > this case, we want to start with all the features > >> disabled > >> > > and > >> > > > >> allow > >> > > > >> > >>> the > >> > > > >> > >>> > > user to enable them when they are ready. > >> > > > >> > >>> > > > >> > > > >> > >>> > > Enabling all the possible features immediately after > an > >> > > > upgrade > >> > > > >> > >>> could be > >> > > > >> > >>> > > harmful to the cluster. On the other hand, for a new > >> > > cluster, > >> > > > >> we do > >> > > > >> > >>> want > >> > > > >> > >>> > > to enable all the possible features immediately . I > was > >> > > > >> proposing > >> > > > >> > >>> this as a > >> > > > >> > >>> > > way to distinguish the two cases (since the new > cluster > >> > will > >> > > > >> never be > >> > > > >> > >>> > > started with an old IBP). > >> > > > >> > >>> > > > >> > > > >> > >>> > > > Colin MccCabe wrote: > >> > > > >> > >>> > > > > And now, something a little bit bigger (sorry). > >> For > >> > > > >> finalized > >> > > > >> > >>> > > features, > >> > > > >> > >>> > > > > why do we need both min_version_level and > >> > > > max_version_level? > >> > > > >> > >>> Assuming > >> > > > >> > >>> > > that > >> > > > >> > >>> > > > > we want all the brokers to be on the same feature > >> > > version > >> > > > >> level, > >> > > > >> > >>> we > >> > > > >> > >>> > > really only care > >> > > > >> > >>> > > > > about three numbers for each feature, right? The > >> > > minimum > >> > > > >> > >>> supported > >> > > > >> > >>> > > version > >> > > > >> > >>> > > > > level, the maximum supported version level, and > the > >> > > > current > >> > > > >> > >>> active > >> > > > >> > >>> > > version level. > >> > > > >> > >>> > > > > >> > > > >> > >>> > > > > We don't actually want different brokers to be on > >> > > > different > >> > > > >> > >>> versions of > >> > > > >> > >>> > > > > the same feature, right? So we can just have one > >> > number > >> > > > for > >> > > > >> > >>> current > >> > > > >> > >>> > > > > version level, rather than two. At least that's > >> what > >> > I > >> > > > was > >> > > > >> > >>> thinking > >> > > > >> > >>> > > -- let > >> > > > >> > >>> > > > > me know if I missed something. > >> > > > >> > >>> > > > > >> > > > >> > >>> > > > (Kowshik): It is my understanding that the "current > >> > active > >> > > > >> version > >> > > > >> > >>> level" > >> > > > >> > >>> > > > that you have mentioned, is the > "max_version_level". > >> But > >> > > we > >> > > > >> still > >> > > > >> > >>> > > > maintain/publish both min and max version levels, > >> > because, > >> > > > the > >> > > > >> > >>> detail > >> > > > >> > >>> > > about > >> > > > >> > >>> > > > min level is useful to external clients. This is > >> > described > >> > > > >> below. > >> > > > >> > >>> > > > > >> > > > >> > >>> > > > For any feature F, think of the closed range: > >> > > > >> [min_version_level, > >> > > > >> > >>> > > > max_version_level] as the range of finalized > >> versions, > >> > > > that's > >> > > > >> > >>> guaranteed > >> > > > >> > >>> > > to > >> > > > >> > >>> > > > be supported by all brokers in the cluster. > >> > > > >> > >>> > > > - "max_version_level" is the finalized highest > >> common > >> > > > version > >> > > > >> > >>> among all > >> > > > >> > >>> > > > brokers, > >> > > > >> > >>> > > > - "min_version_level" is the finalized lowest > common > >> > > > version > >> > > > >> > >>> among all > >> > > > >> > >>> > > > brokers. > >> > > > >> > >>> > > > > >> > > > >> > >>> > > > Next, think of "client" here as the "user of the > new > >> > > feature > >> > > > >> > >>> versions > >> > > > >> > >>> > > > system". Imagine that such a client learns about > >> > finalized > >> > > > >> feature > >> > > > >> > >>> > > > versions, and exercises some logic based on the > >> version. > >> > > > These > >> > > > >> > >>> clients > >> > > > >> > >>> > > can > >> > > > >> > >>> > > > be of 2 types: > >> > > > >> > >>> > > > 1. Some part of the broker code itself could behave > >> > like a > >> > > > >> client > >> > > > >> > >>> trying > >> > > > >> > >>> > > to > >> > > > >> > >>> > > > use some feature that's "internal" to the broker > >> > cluster. > >> > > > >> Such a > >> > > > >> > >>> client > >> > > > >> > >>> > > > would learn the latest finalized features via ZK. > >> > > > >> > >>> > > > 2. An external system (ex: Streams) could behave > >> like a > >> > > > >> client, > >> > > > >> > >>> trying to > >> > > > >> > >>> > > > use some "external" facing feature. Such a client > >> would > >> > > > learn > >> > > > >> > >>> latest > >> > > > >> > >>> > > > finalized features via ApiVersionsRequest. Ex: > >> > > > >> group_coordinator > >> > > > >> > >>> feature > >> > > > >> > >>> > > > described in the KIP. > >> > > > >> > >>> > > > > >> > > > >> > >>> > > > Next, imagine that for F, the max_version_level is > >> > > > >> successfully > >> > > > >> > >>> bumped by > >> > > > >> > >>> > > > +1 (via Controller API). Now it is guaranteed that > >> all > >> > > > brokers > >> > > > >> > >>> (i.e. > >> > > > >> > >>> > > > internal clients) understand max_version_level + 1. > >> > > However, > >> > > > >> it is > >> > > > >> > >>> still > >> > > > >> > >>> > > > not guaranteed that all external clients have > support > >> > for > >> > > > (or > >> > > > >> have > >> > > > >> > >>> > > > activated) the logic for the newer version. Why? > >> > Because, > >> > > > >> this is > >> > > > >> > >>> > > > subjective as explained next: > >> > > > >> > >>> > > > > >> > > > >> > >>> > > > 1. On one hand, imagine F as an internal feature > only > >> > > > >> relevant to > >> > > > >> > >>> > > Brokers. > >> > > > >> > >>> > > > The binary for the internal client logic is > >> controlled > >> > by > >> > > > >> Broker > >> > > > >> > >>> cluster > >> > > > >> > >>> > > > deployments. When shipping a new Broker release, we > >> > > wouldn't > >> > > > >> bump > >> > > > >> > >>> max > >> > > > >> > >>> > > > "supported" feature version for F by 1, unless we > >> have > >> > > > >> introduced > >> > > > >> > >>> some > >> > > > >> > >>> > > new > >> > > > >> > >>> > > > logic (with a potentially breaking change) in the > >> > Broker. > >> > > > >> > >>> Furthermore, > >> > > > >> > >>> > > such > >> > > > >> > >>> > > > feature logic in the broker should/will not be > >> > implemented > >> > > > in > >> > > > >> a > >> > > > >> > >>> way that > >> > > > >> > >>> > > it > >> > > > >> > >>> > > > would activate logic for an older feature version > >> after > >> > it > >> > > > has > >> > > > >> > >>> migrated > >> > > > >> > >>> > > to > >> > > > >> > >>> > > > using the logic for a newer feature version > (because > >> > this > >> > > > >> could > >> > > > >> > >>> break the > >> > > > >> > >>> > > > cluster!). For these cases, max_version_level will > be > >> > very > >> > > > >> useful > >> > > > >> > >>> for > >> > > > >> > >>> > > > decision making. > >> > > > >> > >>> > > > > >> > > > >> > >>> > > > 2. On the other hand, imagine F as an external > facing > >> > > > feature. > >> > > > >> > >>> External > >> > > > >> > >>> > > > clients are not within the control of Broker > >> cluster. An > >> > > > >> external > >> > > > >> > >>> client > >> > > > >> > >>> > > > may not have upgraded it's code (yet) to use > >> > > > >> 'max_version_level + > >> > > > >> > >>> 1'. > >> > > > >> > >>> > > But, > >> > > > >> > >>> > > > the Kafka cluster could have been deployed with > >> support > >> > > for > >> > > > >> > >>> > > > 'max_version_level + 1' of an external facing > >> feature. > >> > > Now, > >> > > > >> these > >> > > > >> > >>> > > external > >> > > > >> > >>> > > > clients (until an upgrade) are benefitted in > learning > >> > > "what > >> > > > >> is the > >> > > > >> > >>> lowest > >> > > > >> > >>> > > > common version for F among all brokers?". This is > >> where > >> > > the > >> > > > >> > >>> > > > "min_version_level" becomes useful. Using this, a > >> client > >> > > > could > >> > > > >> > >>> learn the > >> > > > >> > >>> > > > specific supported versions that's lower than > >> > > > >> max_version_level > >> > > > >> > >>> (instead > >> > > > >> > >>> > > of > >> > > > >> > >>> > > > assuming that all brokers support the range: [1, > >> > > > >> > >>> max_version_level]). For > >> > > > >> > >>> > > > example, if the cluster deprecates > >> "min_version_level", > >> > > then > >> > > > >> the > >> > > > >> > >>> client > >> > > > >> > >>> > > > becomes aware because it periodically learns the > >> latest > >> > > > >> > >>> > > "min_version_level" > >> > > > >> > >>> > > > via ApiVersionsRequest. > >> > > > >> > >>> > > > > >> > > > >> > >>> > > > >> > > > >> > >>> > > Thanks for the explanation. I agree that this does > >> make > >> > > sense > >> > > > >> when > >> > > > >> > >>> you > >> > > > >> > >>> > > take the client perspective into account. > >> > > > >> > >>> > > > >> > > > >> > >>> > > best, > >> > > > >> > >>> > > Colin > >> > > > >> > >>> > > > >> > > > >> > >>> > > > >> > > > >> > >>> > > > > >> > > > >> > >>> > > > Cheers, > >> > > > >> > >>> > > > Kowshik > >> > > > >> > >>> > > > > >> > > > >> > >>> > > > > >> > > > >> > >>> > > > > >> > > > >> > >>> > > > > >> > > > >> > >>> > > > On Thu, Apr 23, 2020 at 12:07 PM Colin McCabe < > >> > > > >> cmcc...@apache.org> > >> > > > >> > >>> > > wrote: > >> > > > >> > >>> > > > > >> > > > >> > >>> > > > > Hi Kowshik, > >> > > > >> > >>> > > > > > >> > > > >> > >>> > > > > Thanks again for working on this-- it looks > >> great. I > >> > > went > >> > > > >> over > >> > > > >> > >>> the KIP > >> > > > >> > >>> > > > > again and have a few more comments. > >> > > > >> > >>> > > > > > >> > > > >> > >>> > > > > === > >> > > > >> > >>> > > > > > >> > > > >> > >>> > > > > It would be good to note that deprecating a > feature > >> > > > version > >> > > > >> (in > >> > > > >> > >>> other > >> > > > >> > >>> > > > > words, increasing minVersionLevel on the broker) > >> is an > >> > > > >> > >>> incompatible > >> > > > >> > >>> > > change, > >> > > > >> > >>> > > > > which requires a major release of Kafka. > >> > > > >> > >>> > > > > > >> > > > >> > >>> > > > > === > >> > > > >> > >>> > > > > > >> > > > >> > >>> > > > > I think the strategy for transitioning from not > >> > having a > >> > > > >> > >>> /features > >> > > > >> > >>> > > znode > >> > > > >> > >>> > > > > to having one could use some work. The current > >> > proposal > >> > > is > >> > > > >> to > >> > > > >> > >>> wait for > >> > > > >> > >>> > > all > >> > > > >> > >>> > > > > the brokers to fill in their feature znodes and > >> then > >> > > pick > >> > > > >> the > >> > > > >> > >>> highest > >> > > > >> > >>> > > > > common versions. But that requires blocking in > the > >> > > > >> controller > >> > > > >> > >>> startup > >> > > > >> > >>> > > code > >> > > > >> > >>> > > > > until the whole cluster is active (technically, a > >> > point > >> > > in > >> > > > >> time > >> > > > >> > >>> which > >> > > > >> > >>> > > we > >> > > > >> > >>> > > > > never really know that we have reached...) > >> > > > >> > >>> > > > > > >> > > > >> > >>> > > > > Instead, I think it would be better to have a > >> strategy > >> > > > like > >> > > > >> this: > >> > > > >> > >>> > > > > 1. If the controller comes up and there is no > >> > /features > >> > > > >> znode > >> > > > >> > >>> AND the > >> > > > >> > >>> > > IBP > >> > > > >> > >>> > > > > is less than 2.6, create a /features znode where > >> all > >> > the > >> > > > >> > >>> features are > >> > > > >> > >>> > > > > disabled. > >> > > > >> > >>> > > > > 2. If the controller comes up and there is no > >> > /features > >> > > > >> znode > >> > > > >> > >>> AND the > >> > > > >> > >>> > > IBP > >> > > > >> > >>> > > > > is greater than or equal to 2.6, create a > /features > >> > > znode > >> > > > >> where > >> > > > >> > >>> all the > >> > > > >> > >>> > > > > features are enabled at the highest versions > >> supported > >> > > by > >> > > > >> the > >> > > > >> > >>> > > controller. > >> > > > >> > >>> > > > > > >> > > > >> > >>> > > > > People upgrading from the pre-KIP-584 > >> > > > >> > >>> <https://issues.apache.org/jira/browse/KIP-584> > >> > > > >> > >>> > > > > <https://issues.apache.org/jira/browse/KIP-584> > >> world > >> > > > will > >> > > > >> end > >> > > > >> > >>> up in > >> > > > >> > >>> > > case > >> > > > >> > >>> > > > > #1 because they have to do a double roll to > >> upgrade, > >> > and > >> > > > >> during > >> > > > >> > >>> the > >> > > > >> > >>> > > first > >> > > > >> > >>> > > > > roll, the IBP is unchanged. People creating new > >> > > clusters > >> > > > >> from > >> > > > >> > >>> scratch > >> > > > >> > >>> > > will > >> > > > >> > >>> > > > > end up in case #2, which is what we want since we > >> > don't > >> > > > >> want a > >> > > > >> > >>> brand > >> > > > >> > >>> > > new > >> > > > >> > >>> > > > > cluster to be using old feature flag versions. > >> > > > >> > >>> > > > > > >> > > > >> > >>> > > > > === > >> > > > >> > >>> > > > > > >> > > > >> > >>> > > > > UpdateFeaturesResponse#ErrorMessage should > specify > >> > > > >> > >>> nullableVersions > >> > > > >> > >>> > > since > >> > > > >> > >>> > > > > null is a valid value here > >> > > > >> > >>> > > > > > >> > > > >> > >>> > > > > Also, in the message format, the tags we use need > >> to > >> > > start > >> > > > >> at 0. > >> > > > >> > >>> > > > > > >> > > > >> > >>> > > > > === > >> > > > >> > >>> > > > > > >> > > > >> > >>> > > > > I don't think we need the > >> FEATURE_UPDATE_IN_PROGRESS > >> > > error > >> > > > >> > >>> code. The > >> > > > >> > >>> > > > > controller is basically single-threaded and will > >> only > >> > do > >> > > > >> one of > >> > > > >> > >>> these > >> > > > >> > >>> > > > > operations at once. Even if it weren't, though, > we > >> > > could > >> > > > >> simply > >> > > > >> > >>> block > >> > > > >> > >>> > > the > >> > > > >> > >>> > > > > second operation behind the first one. > >> > > > >> > >>> > > > > > >> > > > >> > >>> > > > > === > >> > > > >> > >>> > > > > > >> > > > >> > >>> > > > > For updateFeatures, it would be good to specify > >> that > >> > if > >> > > a > >> > > > >> single > >> > > > >> > >>> > > feature > >> > > > >> > >>> > > > > version update in the batch can't be done, none > of > >> > them > >> > > > are > >> > > > >> > >>> done. I > >> > > > >> > >>> > > think > >> > > > >> > >>> > > > > this was the intention, but I wasn't able to find > >> it > >> > > > >> spelled out > >> > > > >> > >>> > > (maybe i > >> > > > >> > >>> > > > > missed it). > >> > > > >> > >>> > > > > > >> > > > >> > >>> > > > > === > >> > > > >> > >>> > > > > > >> > > > >> > >>> > > > > And now, something a little bit bigger (sorry). > >> For > >> > > > >> finalized > >> > > > >> > >>> > > features, > >> > > > >> > >>> > > > > why do we need both min_version_level and > >> > > > max_version_level? > >> > > > >> > >>> Assuming > >> > > > >> > >>> > > that > >> > > > >> > >>> > > > > we want all the brokers to be on the same feature > >> > > version > >> > > > >> level, > >> > > > >> > >>> we > >> > > > >> > >>> > > really > >> > > > >> > >>> > > > > only care about three numbers for each feature, > >> right? > >> > > > The > >> > > > >> > >>> minimum > >> > > > >> > >>> > > > > supported version level, the maximum supported > >> version > >> > > > >> level, > >> > > > >> > >>> and the > >> > > > >> > >>> > > > > current active version level. > >> > > > >> > >>> > > > > > >> > > > >> > >>> > > > > We don't actually want different brokers to be on > >> > > > different > >> > > > >> > >>> versions of > >> > > > >> > >>> > > > > the same feature, right? So we can just have one > >> > number > >> > > > for > >> > > > >> > >>> current > >> > > > >> > >>> > > > > version level, rather than two. At least that's > >> what > >> > I > >> > > > was > >> > > > >> > >>> thinking > >> > > > >> > >>> > > -- let > >> > > > >> > >>> > > > > me know if I missed something. > >> > > > >> > >>> > > > > > >> > > > >> > >>> > > > > best, > >> > > > >> > >>> > > > > Colin > >> > > > >> > >>> > > > > > >> > > > >> > >>> > > > > > >> > > > >> > >>> > > > > On Tue, Apr 21, 2020, at 13:01, Dhruvil Shah > wrote: > >> > > > >> > >>> > > > > > Thanks for the KIP! +1 (non-binding) > >> > > > >> > >>> > > > > > > >> > > > >> > >>> > > > > > On Tue, Apr 21, 2020 at 6:09 AM David Jacot < > >> > > > >> > >>> dja...@confluent.io> > >> > > > >> > >>> > > wrote: > >> > > > >> > >>> > > > > > > >> > > > >> > >>> > > > > > > Great KIP, thanks! +1 (non-binding) > >> > > > >> > >>> > > > > > > > >> > > > >> > >>> > > > > > > On Fri, Apr 17, 2020 at 8:56 PM Guozhang > Wang < > >> > > > >> > >>> wangg...@gmail.com> > >> > > > >> > >>> > > > > wrote: > >> > > > >> > >>> > > > > > > > >> > > > >> > >>> > > > > > > > Thanks for the great KIP Kowshik, +1 > >> (binding). > >> > > > >> > >>> > > > > > > > > >> > > > >> > >>> > > > > > > > On Fri, Apr 17, 2020 at 11:22 AM Jun Rao < > >> > > > >> j...@confluent.io > >> > > > >> > >>> > > >> > > > >> > >>> > > wrote: > >> > > > >> > >>> > > > > > > > > >> > > > >> > >>> > > > > > > > > Hi, Kowshik, > >> > > > >> > >>> > > > > > > > > > >> > > > >> > >>> > > > > > > > > Thanks for the KIP. +1 > >> > > > >> > >>> > > > > > > > > > >> > > > >> > >>> > > > > > > > > Jun > >> > > > >> > >>> > > > > > > > > > >> > > > >> > >>> > > > > > > > > On Thu, Apr 16, 2020 at 11:14 AM Kowshik > >> > > Prakasam > >> > > > < > >> > > > >> > >>> > > > > > > > kpraka...@confluent.io> > >> > > > >> > >>> > > > > > > > > wrote: > >> > > > >> > >>> > > > > > > > > > >> > > > >> > >>> > > > > > > > > > Hi all, > >> > > > >> > >>> > > > > > > > > > > >> > > > >> > >>> > > > > > > > > > I'd like to start a vote for KIP-584 > >> > > > >> > >>> <https://issues.apache.org/jira/browse/KIP-584> > >> > > > >> > >>> > > > > <https://issues.apache.org/jira/browse/KIP-584>. > >> The > >> > > link > >> > > > >> to > >> > > > >> > >>> the KIP > >> > > > >> > >>> > > can > >> > > > >> > >>> > > > > be > >> > > > >> > >>> > > > > > > found > >> > > > >> > >>> > > > > > > > > > here: > >> > > > >> > >>> > > > > > > > > > > >> > > > >> > >>> > > > > > > > > > > >> > > > >> > >>> > > > > > > > > > >> > > > >> > >>> > > > > > > > > >> > > > >> > >>> > > > > > > > >> > > > >> > >>> > > > > > >> > > > >> > >>> > > > >> > > > >> > >>> > >> > > > >> > >> > > > > >> > > > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features > >> > > > >> > >>> > > > > > > > > > . > >> > > > >> > >>> > > > > > > > > > > >> > > > >> > >>> > > > > > > > > > Thanks! > >> > > > >> > >>> > > > > > > > > > > >> > > > >> > >>> > > > > > > > > > > >> > > > >> > >>> > > > > > > > > > Cheers, > >> > > > >> > >>> > > > > > > > > > Kowshik > >> > > > >> > >>> > > > > > > > > > > >> > > > >> > >>> > > > > > > > > > >> > > > >> > >>> > > > > > > > > >> > > > >> > >>> > > > > > > > > >> > > > >> > >>> > > > > > > > -- > >> > > > >> > >>> > > > > > > > -- Guozhang > >> > > > >> > >>> > > > > > > > > >> > > > >> > >>> > > > > > > > >> > > > >> > >>> > > > > > > >> > > > >> > >>> > > > > > >> > > > >> > >>> > > > > >> > > > >> > >>> > > > >> > > > >> > >>> > > >> > > > >> > >>> > >> > > > >> > >> > >> > > > >> > > >> > > > >> > >> > > > > > >> > > > > >> > > > >> > > >> > > >