Hi Colin, Thanks for the explanation! I agree with you, and I have updated the KIP. Here is a link to relevant section: https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Controller:ZKnodebootstrapwithdefaultvalues
Cheers, Kowshik On Fri, Apr 24, 2020 at 8:50 PM Colin McCabe <cmcc...@apache.org> wrote: > On Fri, Apr 24, 2020, at 00:01, Kowshik Prakasam wrote: > > (Kowshik): Great point! However for case #1, I'm not sure why we need to > > create a '/features' ZK node with disabled features. Instead, do you see > > any drawback if we just do not create it? i.e. if IBP is less than 2.6, > the > > controller treats the case as though the versioning system is completely > > disabled, and would not create a non-existing '/features' node. > > Hi Kowshik, > > When the IBP is less than 2.6, but the software has been upgraded to a > state where it supports this KIP, that > means the user is upgrading from an earlier version of the software. In > this case, we want to start with all the features disabled and allow the > user to enable them when they are ready. > > Enabling all the possible features immediately after an upgrade could be > harmful to the cluster. On the other hand, for a new cluster, we do want > to enable all the possible features immediately . I was proposing this as a > way to distinguish the two cases (since the new cluster will never be > started with an old IBP). > > > Colin MccCabe wrote: > > > And now, something a little bit bigger (sorry). For finalized > features, > > > why do we need both min_version_level and max_version_level? Assuming > that > > > we want all the brokers to be on the same feature version level, we > really only care > > > about three numbers for each feature, right? The minimum supported > version > > > level, the maximum supported version level, and the current active > version level. > > > > > We don't actually want different brokers to be on different versions of > > > the same feature, right? So we can just have one number for current > > > version level, rather than two. At least that's what I was thinking > -- let > > > me know if I missed something. > > > > (Kowshik): It is my understanding that the "current active version level" > > that you have mentioned, is the "max_version_level". But we still > > maintain/publish both min and max version levels, because, the detail > about > > min level is useful to external clients. This is described below. > > > > For any feature F, think of the closed range: [min_version_level, > > max_version_level] as the range of finalized versions, that's guaranteed > to > > be supported by all brokers in the cluster. > > - "max_version_level" is the finalized highest common version among all > > brokers, > > - "min_version_level" is the finalized lowest common version among all > > brokers. > > > > Next, think of "client" here as the "user of the new feature versions > > system". Imagine that such a client learns about finalized feature > > versions, and exercises some logic based on the version. These clients > can > > be of 2 types: > > 1. Some part of the broker code itself could behave like a client trying > to > > use some feature that's "internal" to the broker cluster. Such a client > > would learn the latest finalized features via ZK. > > 2. An external system (ex: Streams) could behave like a client, trying to > > use some "external" facing feature. Such a client would learn latest > > finalized features via ApiVersionsRequest. Ex: group_coordinator feature > > described in the KIP. > > > > Next, imagine that for F, the max_version_level is successfully bumped by > > +1 (via Controller API). Now it is guaranteed that all brokers (i.e. > > internal clients) understand max_version_level + 1. However, it is still > > not guaranteed that all external clients have support for (or have > > activated) the logic for the newer version. Why? Because, this is > > subjective as explained next: > > > > 1. On one hand, imagine F as an internal feature only relevant to > Brokers. > > The binary for the internal client logic is controlled by Broker cluster > > deployments. When shipping a new Broker release, we wouldn't bump max > > "supported" feature version for F by 1, unless we have introduced some > new > > logic (with a potentially breaking change) in the Broker. Furthermore, > such > > feature logic in the broker should/will not be implemented in a way that > it > > would activate logic for an older feature version after it has migrated > to > > using the logic for a newer feature version (because this could break the > > cluster!). For these cases, max_version_level will be very useful for > > decision making. > > > > 2. On the other hand, imagine F as an external facing feature. External > > clients are not within the control of Broker cluster. An external client > > may not have upgraded it's code (yet) to use 'max_version_level + 1'. > But, > > the Kafka cluster could have been deployed with support for > > 'max_version_level + 1' of an external facing feature. Now, these > external > > clients (until an upgrade) are benefitted in learning "what is the lowest > > common version for F among all brokers?". This is where the > > "min_version_level" becomes useful. Using this, a client could learn the > > specific supported versions that's lower than max_version_level (instead > of > > assuming that all brokers support the range: [1, max_version_level]). For > > example, if the cluster deprecates "min_version_level", then the client > > becomes aware because it periodically learns the latest > "min_version_level" > > via ApiVersionsRequest. > > > > Thanks for the explanation. I agree that this does make sense when you > take the client perspective into account. > > best, > Colin > > > > > > Cheers, > > Kowshik > > > > > > > > > > On Thu, Apr 23, 2020 at 12:07 PM Colin McCabe <cmcc...@apache.org> > wrote: > > > > > Hi Kowshik, > > > > > > Thanks again for working on this-- it looks great. I went over the KIP > > > again and have a few more comments. > > > > > > === > > > > > > It would be good to note that deprecating a feature version (in other > > > words, increasing minVersionLevel on the broker) is an incompatible > change, > > > which requires a major release of Kafka. > > > > > > === > > > > > > I think the strategy for transitioning from not having a /features > znode > > > to having one could use some work. The current proposal is to wait for > all > > > the brokers to fill in their feature znodes and then pick the highest > > > common versions. But that requires blocking in the controller startup > code > > > until the whole cluster is active (technically, a point in time which > we > > > never really know that we have reached...) > > > > > > Instead, I think it would be better to have a strategy like this: > > > 1. If the controller comes up and there is no /features znode AND the > IBP > > > is less than 2.6, create a /features znode where all the features are > > > disabled. > > > 2. If the controller comes up and there is no /features znode AND the > IBP > > > is greater than or equal to 2.6, create a /features znode where all the > > > features are enabled at the highest versions supported by the > controller. > > > > > > People upgrading from the pre-KIP-584 > > > <https://issues.apache.org/jira/browse/KIP-584> world will end up in > case > > > #1 because they have to do a double roll to upgrade, and during the > first > > > roll, the IBP is unchanged. People creating new clusters from scratch > will > > > end up in case #2, which is what we want since we don't want a brand > new > > > cluster to be using old feature flag versions. > > > > > > === > > > > > > UpdateFeaturesResponse#ErrorMessage should specify nullableVersions > since > > > null is a valid value here > > > > > > Also, in the message format, the tags we use need to start at 0. > > > > > > === > > > > > > I don't think we need the FEATURE_UPDATE_IN_PROGRESS error code. The > > > controller is basically single-threaded and will only do one of these > > > operations at once. Even if it weren't, though, we could simply block > the > > > second operation behind the first one. > > > > > > === > > > > > > For updateFeatures, it would be good to specify that if a single > feature > > > version update in the batch can't be done, none of them are done. I > think > > > this was the intention, but I wasn't able to find it spelled out > (maybe i > > > missed it). > > > > > > === > > > > > > And now, something a little bit bigger (sorry). For finalized > features, > > > why do we need both min_version_level and max_version_level? Assuming > that > > > we want all the brokers to be on the same feature version level, we > really > > > only care about three numbers for each feature, right? The minimum > > > supported version level, the maximum supported version level, and the > > > current active version level. > > > > > > We don't actually want different brokers to be on different versions of > > > the same feature, right? So we can just have one number for current > > > version level, rather than two. At least that's what I was thinking > -- let > > > me know if I missed something. > > > > > > best, > > > Colin > > > > > > > > > On Tue, Apr 21, 2020, at 13:01, Dhruvil Shah wrote: > > > > Thanks for the KIP! +1 (non-binding) > > > > > > > > On Tue, Apr 21, 2020 at 6:09 AM David Jacot <dja...@confluent.io> > wrote: > > > > > > > > > Great KIP, thanks! +1 (non-binding) > > > > > > > > > > On Fri, Apr 17, 2020 at 8:56 PM Guozhang Wang <wangg...@gmail.com> > > > wrote: > > > > > > > > > > > Thanks for the great KIP Kowshik, +1 (binding). > > > > > > > > > > > > On Fri, Apr 17, 2020 at 11:22 AM Jun Rao <j...@confluent.io> > wrote: > > > > > > > > > > > > > Hi, Kowshik, > > > > > > > > > > > > > > Thanks for the KIP. +1 > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > On Thu, Apr 16, 2020 at 11:14 AM Kowshik Prakasam < > > > > > > kpraka...@confluent.io> > > > > > > > wrote: > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > I'd like to start a vote for KIP-584 > > > <https://issues.apache.org/jira/browse/KIP-584>. The link to the KIP > can > > > be > > > > > found > > > > > > > > here: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features > > > > > > > > . > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > Kowshik > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > -- Guozhang > > > > > > > > > > > > > > > > > > > > >