Hi, Kowshik, Thanks for the reply. Makes sense. Just one more question.
200. My understanding is that If the CLI tool passes the '--allow-downgrade' flag when updating a specific feature, then a future downgrade is possible. Otherwise, the feature is now downgradable. If so, I was wondering how the controller remembers this since it can be restarted over time? Jun On Tue, Apr 14, 2020 at 6:49 PM Kowshik Prakasam <kpraka...@confluent.io> wrote: > Hi Jun, > > Thanks a lot for the feedback and the questions! > Please find my response below. > > > 200. The UpdateFeaturesRequest includes an AllowDowngrade field. It seems > > that field needs to be persisted somewhere in ZK? > > (Kowshik): Great question! Below is my explanation. Please help me > understand, > if you feel there are cases where we would need to still persist it in ZK. > > Firstly I have updated my thoughts into the KIP now, under the 'guidelines' > section: > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guidelinesonfeatureversionsandworkflows > > The allowDowngrade boolean field is just to restrict the user intent, and > to remind > them to double check their intent before proceeding. It should be set to > true > by the user in a request, only when the user intent is to forcefully > "attempt" a > downgrade of a specific feature's max version level, to the provided value > in > the request. > > We can extend this safeguard. The controller (on it's end) can maintain > rules in the code, that, for safety reasons would outright reject certain > downgrades > from a specific max_version_level for a specific feature. Such rejections > may > happen depending on the feature being downgraded, and from what version > level. > > The CLI tool only allows a downgrade attempt in conjunction with specific > flags and sub-commands. For example, in the CLI tool, if the user uses the > 'downgrade-all' command, or passes '--allow-downgrade' flag when updating a > specific feature, only then the tool will translate this ask to setting > 'allowDowngrade' field in the request to the server. > > > 201. UpdateFeaturesResponse has the following top level fields. Should > > those fields be per feature? > > > > "fields": [ > > { "name": "ErrorCode", "type": "int16", "versions": "0+", > > "about": "The error code, or 0 if there was no error." }, > > { "name": "ErrorMessage", "type": "string", "versions": "0+", > > "about": "The error message, or null if there was no error." } > > ] > > (Kowshik): Great question! > As such, the API is transactional, as explained in the sections linked > below. > Either all provided FeatureUpdate was applied, or none. > It's the reason I felt we can have just one error code + message. > Happy to extend this if you feel otherwise. Please let me know. > > Link to sections: > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-ChangestoKafkaController > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guarantees > > > 202. The /features path in ZK has a field min_version_level. Which API > and > > tool can change that value? > > (Kowshik): Great question! Currently this cannot be modified by using the > API or the tool. > Feature version deprecation (by raising min_version_level) can be done only > by the Controller directly. The rationale is explained in this section: > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Featureversiondeprecation > > > Cheers, > Kowshik > > On Tue, Apr 14, 2020 at 5:33 PM Jun Rao <j...@confluent.io> wrote: > > > Hi, Kowshik, > > > > Thanks for addressing those comments. Just a few more minor comments. > > > > 200. The UpdateFeaturesRequest includes an AllowDowngrade field. It seems > > that field needs to be persisted somewhere in ZK? > > > > 201. UpdateFeaturesResponse has the following top level fields. Should > > those fields be per feature? > > > > "fields": [ > > { "name": "ErrorCode", "type": "int16", "versions": "0+", > > "about": "The error code, or 0 if there was no error." }, > > { "name": "ErrorMessage", "type": "string", "versions": "0+", > > "about": "The error message, or null if there was no error." } > > ] > > > > 202. The /features path in ZK has a field min_version_level. Which API > and > > tool can change that value? > > > > Jun > > > > On Mon, Apr 13, 2020 at 5:12 PM Kowshik Prakasam <kpraka...@confluent.io > > > > wrote: > > > > > Hi Jun, > > > > > > Thanks for the feedback! I have updated the KIP-584 addressing your > > > comments. > > > Please find my response below. > > > > > > > 100.6 You can look for the sentence "This operation requires ALTER on > > > > CLUSTER." in KIP-455. Also, you can check its usage in > > > > KafkaApis.authorize(). > > > > > > (Kowshik): Done. Great point! For the newly introduced UPDATE_FEATURES > > api, > > > I have added a > > > requirement that AclOperation.ALTER is required on > ResourceType.CLUSTER. > > > > > > > 110. Keeping the feature version as int is probably fine. I just felt > > > that > > > > for some of the common user interactions, it's more convenient to > > > > relate that to a release version. For example, if a user wants to > > > downgrade > > > > to a release 2.5, it's easier for the user to use the tool like "tool > > > > --downgrade 2.5" instead of "tool --downgrade --feature X --version > 6". > > > > > > (Kowshik): Great point. Generally, maximum feature version levels are > not > > > downgradable after > > > they are finalized in the cluster. This is because, as a guideline > > bumping > > > feature version level usually is used mainly to convey important > breaking > > > changes. > > > Despite the above, there may be some extreme/rare cases where a user > > wants > > > to downgrade > > > all features to a specific previous release. The user may want to do > this > > > just > > > prior to rolling back a Kafka cluster to a previous release. > > > > > > To support the above, I have made a change to the KIP explaining that > the > > > CLI tool is versioned. > > > The CLI tool internally has knowledge about a map of features to their > > > respective max > > > versions supported by the Broker. The tool's knowledge of features and > > > their version values, > > > is limited to the version of the CLI tool itself i.e. the information > is > > > packaged into the CLI tool > > > when it is released. Whenever a Kafka release introduces a new feature > > > version, or modifies > > > an existing feature version, the CLI tool shall also be updated with > this > > > information, > > > Newer versions of the CLI tool will be released as part of the Kafka > > > releases. > > > > > > Therefore, to achieve the downgrade need, the user just needs to run > the > > > version of > > > the CLI tool that's part of the particular previous release that he/she > > is > > > downgrading to. > > > To help the user with this, there is a new command added to the CLI > tool > > > called `downgrade-all`. > > > This essentially downgrades max version levels of all features in the > > > cluster to the versions > > > known to the CLI tool internally. > > > > > > I have explained the above in the KIP under these sections: > > > > > > Tooling support (have explained that the CLI tool is versioned): > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport > > > > > > Regular CLI tool usage (please refer to point #3, and see the tooling > > > example) > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-RegularCLItoolusage > > > > > > > 110. Similarly, if the client library finds a feature mismatch with > the > > > broker, > > > > the client likely needs to log some error message for the user to > take > > > some > > > > actions. It's much more actionable if the error message is "upgrade > the > > > > broker to release version 2.6" than just "upgrade the broker to > feature > > > > version 7". > > > > > > (Kowshik): That's a really good point! If we use ints for feature > > versions, > > > the best > > > message that client can print for debugging is "broker doesn't support > > > feature version 7", and alongside that print the supported version > range > > > returned > > > by the broker. Then, does it sound reasonable that the user could then > > > reference > > > Kafka release logs to figure out which version of the broker release is > > > required > > > be deployed, to support feature version 7? I couldn't think of a better > > > strategy here. > > > > > > > 120. When should a developer bump up the version of a feature? > > > > > > (Kowshik): Great question! In the KIP, I have added a section: > > 'Guidelines > > > on feature versions and workflows' > > > providing some guidelines on when to use the versioned feature flags, > and > > > what > > > are the regular workflows with the CLI tool. > > > > > > Link to the relevant sections: > > > Guidelines: > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guidelinesonfeatureversionsandworkflows > > > > > > Regular CLI tool usage: > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-RegularCLItoolusage > > > > > > Advanced CLI tool usage: > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-AdvancedCLItoolusage > > > > > > > > > Cheers, > > > Kowshik > > > > > > > > > On Fri, Apr 10, 2020 at 4:25 PM Jun Rao <j...@confluent.io> wrote: > > > > > > > Hi, Kowshik, > > > > > > > > Thanks for the reply. A few more comments. > > > > > > > > 110. Keeping the feature version as int is probably fine. I just felt > > > that > > > > for some of the common user interactions, it's more convenient to > > > > relate that to a release version. For example, if a user wants to > > > downgrade > > > > to a release 2.5, it's easier for the user to use the tool like "tool > > > > --downgrade 2.5" instead of "tool --downgrade --feature X --version > 6". > > > > Similarly, if the client library finds a feature mismatch with the > > > broker, > > > > the client likely needs to log some error message for the user to > take > > > some > > > > actions. It's much more actionable if the error message is "upgrade > the > > > > broker to release version 2.6" than just "upgrade the broker to > feature > > > > version 7". > > > > > > > > 111. Sounds good. > > > > > > > > 120. When should a developer bump up the version of a feature? > > > > > > > > Jun > > > > > > > > On Tue, Apr 7, 2020 at 12:26 AM Kowshik Prakasam < > > kpraka...@confluent.io > > > > > > > > wrote: > > > > > > > > > Hi Jun, > > > > > > > > > > I have updated the KIP for the item 111. > > > > > I'm in the process of addressing 100.6, and will provide an update > > > soon. > > > > > I think item 110 is still under discussion given we are now > > providing a > > > > way > > > > > to finalize > > > > > all features to their latest version levels. In any case, please > let > > us > > > > > know > > > > > how you feel in response to Colin's comments on this topic. > > > > > > > > > > > 111. To put this in context, when we had IBP, the default value > is > > > the > > > > > > current released version. So, if you are a brand new user, you > > don't > > > > need > > > > > > to configure IBP and all new features will be immediately > available > > > in > > > > > the > > > > > > new cluster. If you are upgrading from an old version, you do > need > > to > > > > > > understand and configure IBP. I see a similar pattern here for > > > > > > features. From the ease of use perspective, ideally, we shouldn't > > > > require > > > > > a > > > > > > new user to have an extra step such as running a bootstrap script > > > > unless > > > > > > it's truly necessary. If someone has a special need (all the > cases > > > you > > > > > > mentioned seem special cases?), they can configure a mode such > that > > > > > > features are enabled/disabled manually. > > > > > > > > > > (Kowshik): That makes sense, thanks for the idea! Sorry if I didn't > > > > > understand > > > > > this need earlier. I have updated the KIP with the approach that > > > whenever > > > > > the '/features' node is absent, the controller by default will > > > bootstrap > > > > > the node > > > > > to contain the latest feature levels. Here is the new section in > the > > > KIP > > > > > describing > > > > > the same: > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Controller:ZKnodebootstrapwithdefaultvalues > > > > > > > > > > Next, as I explained in my response to Colin's suggestions, we are > > now > > > > > providing a `--finalize-latest-features` flag with the tooling. > This > > > lets > > > > > the sysadmin finalize all features known to the controller to their > > > > latest > > > > > version > > > > > levels. Please look at this section (point #3 and the tooling > example > > > > > later): > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport > > > > > > > > > > > > > > > Do you feel this addresses your comment/concern? > > > > > > > > > > > > > > > Cheers, > > > > > Kowshik > > > > > > > > > > On Mon, Apr 6, 2020 at 12:06 PM Jun Rao <j...@confluent.io> wrote: > > > > > > > > > > > Hi, Kowshik, > > > > > > > > > > > > Thanks for the reply. A few more replies below. > > > > > > > > > > > > 100.6 You can look for the sentence "This operation requires > ALTER > > on > > > > > > CLUSTER." in KIP-455. Also, you can check its usage in > > > > > > KafkaApis.authorize(). > > > > > > > > > > > > 110. From the external client/tooling perspective, it's more > > natural > > > to > > > > > use > > > > > > the release version for features. If we can use the same release > > > > version > > > > > > for internal representation, it seems simpler (easier to > > understand, > > > no > > > > > > mapping overhead, etc). Is there a benefit with separate external > > and > > > > > > internal versioning schemes? > > > > > > > > > > > > 111. To put this in context, when we had IBP, the default value > is > > > the > > > > > > current released version. So, if you are a brand new user, you > > don't > > > > need > > > > > > to configure IBP and all new features will be immediately > available > > > in > > > > > the > > > > > > new cluster. If you are upgrading from an old version, you do > need > > to > > > > > > understand and configure IBP. I see a similar pattern here for > > > > > > features. From the ease of use perspective, ideally, we shouldn't > > > > > require a > > > > > > new user to have an extra step such as running a bootstrap script > > > > unless > > > > > > it's truly necessary. If someone has a special need (all the > cases > > > you > > > > > > mentioned seem special cases?), they can configure a mode such > that > > > > > > features are enabled/disabled manually. > > > > > > > > > > > > Jun > > > > > > > > > > > > On Fri, Apr 3, 2020 at 5:54 PM Kowshik Prakasam < > > > > kpraka...@confluent.io> > > > > > > wrote: > > > > > > > > > > > > > Hi Jun, > > > > > > > > > > > > > > Thanks for the feedback and suggestions. Please find my > response > > > > below. > > > > > > > > > > > > > > > 100.6 For every new request, the admin needs to control who > is > > > > > allowed > > > > > > to > > > > > > > > issue that request if security is enabled. So, we need to > > assign > > > > the > > > > > > new > > > > > > > > request a ResourceType and possible AclOperations. See > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment > > > > > > > > as an example. > > > > > > > > > > > > > > (Kowshik): I don't see any reference to the words ResourceType > or > > > > > > > AclOperations > > > > > > > in the KIP. Please let me know how I can use the KIP that you > > > linked > > > > to > > > > > > > know how to > > > > > > > setup the appropriate ResourceType and/or ClusterOperation? > > > > > > > > > > > > > > > 105. If we change delete to disable, it's better to do this > > > > > > consistently > > > > > > > in > > > > > > > > request protocol and admin api as well. > > > > > > > > > > > > > > (Kowshik): The API shouldn't be called 'disable' when it is > > > deleting > > > > a > > > > > > > feature. > > > > > > > I've just changed the KIP to use 'delete'. I don't have a > strong > > > > > > > preference. > > > > > > > > > > > > > > > 110. The minVersion/maxVersion for features use int64. > > Currently, > > > > our > > > > > > > > release version schema is major.minor.bugfix (e.g. 2.5.0). > It's > > > > > > possible > > > > > > > > for new features to be included in minor releases too. Should > > we > > > > make > > > > > > the > > > > > > > > feature versioning match the release versioning? > > > > > > > > > > > > > > (Kowshik): The release version can be mapped to a set of > feature > > > > > > versions, > > > > > > > and this can be done, for example in the tool (or even external > > to > > > > the > > > > > > > tool). > > > > > > > Can you please clarify what I'm missing? > > > > > > > > > > > > > > > 111. "During regular operations, the data in the ZK node can > be > > > > > mutated > > > > > > > > only via a specific admin API served only by the > controller." I > > > am > > > > > > > > wondering why can't the controller auto finalize a feature > > > version > > > > > > after > > > > > > > > all brokers are upgraded? For new users who download the > latest > > > > > version > > > > > > > to > > > > > > > > build a new cluster, it's inconvenient for them to have to > > > manually > > > > > > > enable > > > > > > > > each feature. > > > > > > > > > > > > > > (Kowshik): I agree that there is a trade-off here, but it will > > help > > > > > > > to decide whether the automation can be thought through in the > > > future > > > > > > > in a follow up KIP, or right now in this KIP. We may invest > > > > > > > in automation, but we have to decide whether we should do it > > > > > > > now or later. > > > > > > > > > > > > > > For the inconvenience that you mentioned, do you think the > > problem > > > > that > > > > > > you > > > > > > > mentioned can be overcome by asking for the cluster operator > to > > > run > > > > a > > > > > > > bootstrap script when he/she knows that a specific AK release > > has > > > > been > > > > > > > almost completely deployed in a cluster for the first time? > Idea > > is > > > > > that > > > > > > > the > > > > > > > bootstrap script will know how to map a specific AK release to > > > > > finalized > > > > > > > feature versions, and run the `kafka-features.sh` tool > > > appropriately > > > > > > > against > > > > > > > the cluster. > > > > > > > > > > > > > > Now, coming back to your automation proposal/question. > > > > > > > I do see the value of automated feature version finalization, > > but I > > > > > also > > > > > > > see > > > > > > > that this will open up several questions and some risks, as > > > explained > > > > > > > below. > > > > > > > The answers to these depend on the definition of the automation > > we > > > > > choose > > > > > > > to build, and how well does it fit into a kafka deployment. > > > > > > > Basically, it can be unsafe for the controller to finalize > > feature > > > > > > version > > > > > > > upgrades automatically, without learning about the intent of > the > > > > > cluster > > > > > > > operator. > > > > > > > 1. We would sometimes want to lock feature versions only when > we > > > have > > > > > > > externally verified > > > > > > > the stability of the broker binary. > > > > > > > 2. Sometimes only the cluster operator knows that a cluster > > upgrade > > > > is > > > > > > > complete, > > > > > > > and new brokers are highly unlikely to join the cluster. > > > > > > > 3. Only the cluster operator knows that the intent is to deploy > > the > > > > > same > > > > > > > version > > > > > > > of the new broker release across the entire cluster (i.e. the > > > latest > > > > > > > downloaded version). > > > > > > > 4. For downgrades, it appears the controller still needs some > > > > external > > > > > > > input > > > > > > > (such as the proposed tool) to finalize a feature version > > > downgrade. > > > > > > > > > > > > > > If we have automation, that automation can end up failing in > some > > > of > > > > > the > > > > > > > cases > > > > > > > above. Then, we need a way to declare that the cluster is "not > > > ready" > > > > > if > > > > > > > the > > > > > > > controller cannot automatically finalize some basic required > > > feature > > > > > > > version > > > > > > > upgrades across the cluster. We need to make the cluster > operator > > > > aware > > > > > > in > > > > > > > such a scenario (raise an alert or alike). > > > > > > > > > > > > > > > 112. DeleteFeaturesResponse: It seems the apiKey should be 49 > > > > instead > > > > > > of > > > > > > > 48. > > > > > > > > > > > > > > (Kowshik): Done. > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > Kowshik > > > > > > > > > > > > > > On Fri, Apr 3, 2020 at 11:24 AM Jun Rao <j...@confluent.io> > > wrote: > > > > > > > > > > > > > > > Hi, Kowshik, > > > > > > > > > > > > > > > > Thanks for the reply. A few more comments below. > > > > > > > > > > > > > > > > 100.6 For every new request, the admin needs to control who > is > > > > > allowed > > > > > > to > > > > > > > > issue that request if security is enabled. So, we need to > > assign > > > > the > > > > > > new > > > > > > > > request a ResourceType and possible AclOperations. See > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment > > > > > > > > as > > > > > > > > an example. > > > > > > > > > > > > > > > > 105. If we change delete to disable, it's better to do this > > > > > > consistently > > > > > > > in > > > > > > > > request protocol and admin api as well. > > > > > > > > > > > > > > > > 110. The minVersion/maxVersion for features use int64. > > Currently, > > > > our > > > > > > > > release version schema is major.minor.bugfix (e.g. 2.5.0). > It's > > > > > > possible > > > > > > > > for new features to be included in minor releases too. Should > > we > > > > make > > > > > > the > > > > > > > > feature versioning match the release versioning? > > > > > > > > > > > > > > > > 111. "During regular operations, the data in the ZK node can > be > > > > > mutated > > > > > > > > only via a specific admin API served only by the > controller." I > > > am > > > > > > > > wondering why can't the controller auto finalize a feature > > > version > > > > > > after > > > > > > > > all brokers are upgraded? For new users who download the > latest > > > > > version > > > > > > > to > > > > > > > > build a new cluster, it's inconvenient for them to have to > > > manually > > > > > > > enable > > > > > > > > each feature. > > > > > > > > > > > > > > > > 112. DeleteFeaturesResponse: It seems the apiKey should be 49 > > > > instead > > > > > > of > > > > > > > > 48. > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam < > > > > > > kpraka...@confluent.io> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hey Jun, > > > > > > > > > > > > > > > > > > Thanks a lot for the great feedback! Please note that the > > > design > > > > > > > > > has changed a little bit on the KIP, and we now propagate > the > > > > > > finalized > > > > > > > > > features metadata only via ZK watches (instead of > > > > > > UpdateMetadataRequest > > > > > > > > > from the controller). > > > > > > > > > > > > > > > > > > Please find below my response to your questions/feedback, > > with > > > > the > > > > > > > prefix > > > > > > > > > "(Kowshik):". > > > > > > > > > > > > > > > > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse > > > > > > > > > > 100.1 Since this request waits for responses from > brokers, > > > > should > > > > > > we > > > > > > > > add > > > > > > > > > a > > > > > > > > > > timeout in the request (like createTopicRequest)? > > > > > > > > > > > > > > > > > > (Kowshik): Great point! Done. I have added a timeout field. > > > Note: > > > > > we > > > > > > no > > > > > > > > > longer > > > > > > > > > wait for responses from brokers, since the design has been > > > > changed > > > > > so > > > > > > > > that > > > > > > > > > the > > > > > > > > > features information is propagated via ZK. Nevertheless, it > > is > > > > > right > > > > > > to > > > > > > > > > have a timeout > > > > > > > > > for the request. > > > > > > > > > > > > > > > > > > > 100.2 The response schema is a bit weird. Typically, the > > > > response > > > > > > > just > > > > > > > > > > shows an error code and an error message, instead of > > echoing > > > > the > > > > > > > > request. > > > > > > > > > > > > > > > > > > (Kowshik): Great point! Yeah, I have modified it to just > > return > > > > an > > > > > > > error > > > > > > > > > code and a message. > > > > > > > > > Previously it was not echoing the "request", rather it was > > > > > returning > > > > > > > the > > > > > > > > > latest set of > > > > > > > > > cluster-wide finalized features (after applying the > updates). > > > But > > > > > you > > > > > > > are > > > > > > > > > right, > > > > > > > > > the additional info is not required, so I have removed it > > from > > > > the > > > > > > > > response > > > > > > > > > schema. > > > > > > > > > > > > > > > > > > > 100.3 Should we add a separate request to list/describe > the > > > > > > existing > > > > > > > > > > features? > > > > > > > > > > > > > > > > > > (Kowshik): This is already present in the KIP via the > > > > > > > 'DescribeFeatures' > > > > > > > > > Admin API, > > > > > > > > > which, underneath covers uses the ApiVersionsRequest to > > > > > list/describe > > > > > > > the > > > > > > > > > existing features. Please read the 'Tooling support' > section. > > > > > > > > > > > > > > > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single > > > > request. > > > > > > For > > > > > > > > > > DELETE, the version field doesn't make sense. So, I guess > > the > > > > > > broker > > > > > > > > just > > > > > > > > > > ignores this? An alternative way is to have a separate > > > > > > > > > DeleteFeaturesRequest > > > > > > > > > > > > > > > > > > (Kowshik): Great point! I have modified the KIP now to > have 2 > > > > > > separate > > > > > > > > > controller APIs > > > > > > > > > serving these different purposes: > > > > > > > > > 1. updateFeatures > > > > > > > > > 2. deleteFeatures > > > > > > > > > > > > > > > > > > > 100.5 In UpdateFeaturesResponse, we have "The > monotonically > > > > > > > increasing > > > > > > > > > > version of the metadata for finalized features." I am > > > wondering > > > > > why > > > > > > > the > > > > > > > > > > ordering is important? > > > > > > > > > > > > > > > > > > (Kowshik): In the latest KIP write-up, it is called epoch > > > > (instead > > > > > of > > > > > > > > > version), and > > > > > > > > > it is just the ZK node version. Basically, this is the > epoch > > > for > > > > > the > > > > > > > > > cluster-wide > > > > > > > > > finalized feature version metadata. This metadata is served > > to > > > > > > clients > > > > > > > > via > > > > > > > > > the > > > > > > > > > ApiVersionsResponse (for reads). We propagate updates from > > the > > > > > > > > '/features' > > > > > > > > > ZK node > > > > > > > > > to all brokers, via ZK watches setup by each broker on the > > > > > > '/features' > > > > > > > > > node. > > > > > > > > > > > > > > > > > > Now here is why the ordering is important: > > > > > > > > > ZK watches don't propagate at the same time. As a result, > the > > > > > > > > > ApiVersionsResponse > > > > > > > > > is eventually consistent across brokers. This can introduce > > > cases > > > > > > > > > where clients see an older lower epoch of the features > > > metadata, > > > > > > after > > > > > > > a > > > > > > > > > more recent > > > > > > > > > higher epoch was returned at a previous point in time. We > > > expect > > > > > > > clients > > > > > > > > > to always employ the rule that the latest received higher > > epoch > > > > of > > > > > > > > metadata > > > > > > > > > always trumps an older smaller epoch. Those clients that > are > > > > > external > > > > > > > to > > > > > > > > > Kafka should strongly consider discovering the latest > > metadata > > > > once > > > > > > > > during > > > > > > > > > startup from the brokers, and if required refresh the > > metadata > > > > > > > > periodically > > > > > > > > > (to get the latest metadata). > > > > > > > > > > > > > > > > > > > 100.6 Could you specify the required ACL for this new > > > request? > > > > > > > > > > > > > > > > > > (Kowshik): What is ACL, and how could I find out which one > to > > > > > > specify? > > > > > > > > > Please could you provide me some pointers? I'll be glad to > > > update > > > > > the > > > > > > > > > KIP once I know the next steps. > > > > > > > > > > > > > > > > > > > 101. For the broker registration ZK node, should we bump > up > > > the > > > > > > > version > > > > > > > > > in > > > > > > > > > the json? > > > > > > > > > > > > > > > > > > (Kowshik): Great point! Done. I've increased the version in > > the > > > > > > broker > > > > > > > > json > > > > > > > > > by 1. > > > > > > > > > > > > > > > > > > > 102. For the /features ZK node, not sure if we need the > > epoch > > > > > > field. > > > > > > > > Each > > > > > > > > > > ZK node has an internal version field that is incremented > > on > > > > > every > > > > > > > > > update. > > > > > > > > > > > > > > > > > > (Kowshik): Great point! Done. I'm using the ZK node version > > > now, > > > > > > > instead > > > > > > > > of > > > > > > > > > explicitly > > > > > > > > > incremented epoch. > > > > > > > > > > > > > > > > > > > 103. "Enabling the actual semantics of a feature version > > > > > > cluster-wide > > > > > > > > is > > > > > > > > > > left to the discretion of the logic implementing the > > feature > > > > (ex: > > > > > > can > > > > > > > > be > > > > > > > > > > done via dynamic broker config)." Does that mean the > broker > > > > > > > > registration > > > > > > > > > ZK > > > > > > > > > > node will be updated dynamically when this happens? > > > > > > > > > > > > > > > > > > (Kowshik): Not really. The text was just conveying that a > > > broker > > > > > > could > > > > > > > > > "know" of > > > > > > > > > a new feature version, but it does not mean the broker > should > > > > have > > > > > > also > > > > > > > > > activated the effects of the feature version. Knowing vs > > > > activation > > > > > > > are 2 > > > > > > > > > separate things, > > > > > > > > > and the latter can be achieved by dynamic config. I have > > > reworded > > > > > the > > > > > > > > text > > > > > > > > > to > > > > > > > > > make this clear to the reader. > > > > > > > > > > > > > > > > > > > > > > > > > > > > 104. UpdateMetadataRequest > > > > > > > > > > 104.1 It would be useful to describe when the feature > > > metadata > > > > is > > > > > > > > > included > > > > > > > > > > in the request. My understanding is that it's only > included > > > if > > > > > (1) > > > > > > > > there > > > > > > > > > is > > > > > > > > > > a change to the finalized feature; (2) broker restart; > (3) > > > > > > controller > > > > > > > > > > failover. > > > > > > > > > > 104.2 The new fields have the following versions. Why are > > the > > > > > > > versions > > > > > > > > 3+ > > > > > > > > > > when the top version is bumped to 6? > > > > > > > > > > "fields": [ > > > > > > > > > > {"name": "Name", "type": "string", "versions": > > > "3+", > > > > > > > > > > "about": "The name of the feature."}, > > > > > > > > > > {"name": "Version", "type": "int64", > "versions": > > > > "3+", > > > > > > > > > > "about": "The finalized version for the > > feature."} > > > > > > > > > > ] > > > > > > > > > > > > > > > > > > (Kowshik): With the new improved design, we have completely > > > > > > eliminated > > > > > > > > the > > > > > > > > > need to > > > > > > > > > use UpdateMetadataRequest. This is because we now rely on > ZK > > to > > > > > > deliver > > > > > > > > the > > > > > > > > > notifications for changes to the '/features' ZK node. > > > > > > > > > > > > > > > > > > > 105. kafka-features.sh: Instead of using update/delete, > > > perhaps > > > > > > it's > > > > > > > > > better > > > > > > > > > > to use enable/disable? > > > > > > > > > > > > > > > > > > (Kowshik): For delete, yes, I have changed it so that we > > > instead > > > > > call > > > > > > > it > > > > > > > > > 'disable'. > > > > > > > > > However for 'update', it can now also refer to either an > > > upgrade > > > > > or a > > > > > > > > > forced downgrade. > > > > > > > > > Therefore, I have left it the way it is, just calling it as > > > just > > > > > > > > 'update'. > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > Kowshik > > > > > > > > > > > > > > > > > > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <j...@confluent.io> > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi, Kowshik, > > > > > > > > > > > > > > > > > > > > Thanks for the KIP. Looks good overall. A few comments > > below. > > > > > > > > > > > > > > > > > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse > > > > > > > > > > 100.1 Since this request waits for responses from > brokers, > > > > should > > > > > > we > > > > > > > > add > > > > > > > > > a > > > > > > > > > > timeout in the request (like createTopicRequest)? > > > > > > > > > > 100.2 The response schema is a bit weird. Typically, the > > > > response > > > > > > > just > > > > > > > > > > shows an error code and an error message, instead of > > echoing > > > > the > > > > > > > > request. > > > > > > > > > > 100.3 Should we add a separate request to list/describe > the > > > > > > existing > > > > > > > > > > features? > > > > > > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single > > > > request. > > > > > > For > > > > > > > > > > DELETE, the version field doesn't make sense. So, I guess > > the > > > > > > broker > > > > > > > > just > > > > > > > > > > ignores this? An alternative way is to have a separate > > > > > > > > > > DeleteFeaturesRequest > > > > > > > > > > 100.5 In UpdateFeaturesResponse, we have "The > monotonically > > > > > > > increasing > > > > > > > > > > version of the metadata for finalized features." I am > > > wondering > > > > > why > > > > > > > the > > > > > > > > > > ordering is important? > > > > > > > > > > 100.6 Could you specify the required ACL for this new > > > request? > > > > > > > > > > > > > > > > > > > > 101. For the broker registration ZK node, should we bump > up > > > the > > > > > > > version > > > > > > > > > in > > > > > > > > > > the json? > > > > > > > > > > > > > > > > > > > > 102. For the /features ZK node, not sure if we need the > > epoch > > > > > > field. > > > > > > > > Each > > > > > > > > > > ZK node has an internal version field that is incremented > > on > > > > > every > > > > > > > > > update. > > > > > > > > > > > > > > > > > > > > 103. "Enabling the actual semantics of a feature version > > > > > > cluster-wide > > > > > > > > is > > > > > > > > > > left to the discretion of the logic implementing the > > feature > > > > (ex: > > > > > > can > > > > > > > > be > > > > > > > > > > done via dynamic broker config)." Does that mean the > broker > > > > > > > > registration > > > > > > > > > ZK > > > > > > > > > > node will be updated dynamically when this happens? > > > > > > > > > > > > > > > > > > > > 104. UpdateMetadataRequest > > > > > > > > > > 104.1 It would be useful to describe when the feature > > > metadata > > > > is > > > > > > > > > included > > > > > > > > > > in the request. My understanding is that it's only > included > > > if > > > > > (1) > > > > > > > > there > > > > > > > > > is > > > > > > > > > > a change to the finalized feature; (2) broker restart; > (3) > > > > > > controller > > > > > > > > > > failover. > > > > > > > > > > 104.2 The new fields have the following versions. Why are > > the > > > > > > > versions > > > > > > > > 3+ > > > > > > > > > > when the top version is bumped to 6? > > > > > > > > > > "fields": [ > > > > > > > > > > {"name": "Name", "type": "string", "versions": > > > "3+", > > > > > > > > > > "about": "The name of the feature."}, > > > > > > > > > > {"name": "Version", "type": "int64", > "versions": > > > > "3+", > > > > > > > > > > "about": "The finalized version for the > > feature."} > > > > > > > > > > ] > > > > > > > > > > > > > > > > > > > > 105. kafka-features.sh: Instead of using update/delete, > > > perhaps > > > > > > it's > > > > > > > > > better > > > > > > > > > > to use enable/disable? > > > > > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam < > > > > > > > > kpraka...@confluent.io > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hey Boyang, > > > > > > > > > > > > > > > > > > > > > > Thanks for the great feedback! I have updated the KIP > > based > > > > on > > > > > > your > > > > > > > > > > > feedback. > > > > > > > > > > > Please find my response below for your comments, look > for > > > > > > sentences > > > > > > > > > > > starting > > > > > > > > > > > with "(Kowshik)" below. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. "When is it safe for the brokers to begin handling > > EOS > > > > > > > traffic" > > > > > > > > > > could > > > > > > > > > > > be > > > > > > > > > > > > converted as "When is it safe for the brokers to > start > > > > > serving > > > > > > > new > > > > > > > > > > > > Exactly-Once(EOS) semantics" since EOS is not > explained > > > > > earlier > > > > > > > in > > > > > > > > > the > > > > > > > > > > > > context. > > > > > > > > > > > > > > > > > > > > > > (Kowshik): Great point! Done. > > > > > > > > > > > > > > > > > > > > > > > 2. In the *Explanation *section, the metadata version > > > > number > > > > > > part > > > > > > > > > > seems a > > > > > > > > > > > > bit blurred. Could you point a reference to later > > section > > > > > that > > > > > > we > > > > > > > > > going > > > > > > > > > > > to > > > > > > > > > > > > store it in Zookeeper and update it every time when > > there > > > > is > > > > > a > > > > > > > > > feature > > > > > > > > > > > > change? > > > > > > > > > > > > > > > > > > > > > > (Kowshik): Great point! Done. I've added a reference in > > the > > > > > KIP. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3. For the feature downgrade, although it's a > Non-goal > > of > > > > the > > > > > > > KIP, > > > > > > > > > for > > > > > > > > > > > > features such as group coordinator semantics, there > is > > no > > > > > legal > > > > > > > > > > scenario > > > > > > > > > > > to > > > > > > > > > > > > perform a downgrade at all. So having downgrade door > > open > > > > is > > > > > > > pretty > > > > > > > > > > > > error-prone as human faults happen all the time. I'm > > > > assuming > > > > > > as > > > > > > > > new > > > > > > > > > > > > features are implemented, it's not very hard to add a > > > flag > > > > > > during > > > > > > > > > > feature > > > > > > > > > > > > creation to indicate whether this feature is > > > > "downgradable". > > > > > > > Could > > > > > > > > > you > > > > > > > > > > > > explain a bit more on the extra engineering effort > for > > > > > shipping > > > > > > > > this > > > > > > > > > > KIP > > > > > > > > > > > > with downgrade protection in place? > > > > > > > > > > > > > > > > > > > > > > (Kowshik): Great point! I'd agree and disagree here. > > While > > > I > > > > > > agree > > > > > > > > that > > > > > > > > > > > accidental > > > > > > > > > > > downgrades can cause problems, I also think sometimes > > > > > downgrades > > > > > > > > should > > > > > > > > > > > be allowed for emergency reasons (not all downgrades > > cause > > > > > > issues). > > > > > > > > > > > It is just subjective to the feature being downgraded. > > > > > > > > > > > > > > > > > > > > > > To be more strict about feature version downgrades, I > > have > > > > > > modified > > > > > > > > the > > > > > > > > > > KIP > > > > > > > > > > > proposing that we mandate a `--force-downgrade` flag be > > > used > > > > in > > > > > > the > > > > > > > > > > > UPDATE_FEATURES api > > > > > > > > > > > and the tooling, whenever the human is downgrading a > > > > finalized > > > > > > > > feature > > > > > > > > > > > version. > > > > > > > > > > > Hopefully this should cover the requirement, until we > > find > > > > the > > > > > > need > > > > > > > > for > > > > > > > > > > > advanced downgrade support. > > > > > > > > > > > > > > > > > > > > > > > 4. "Each broker’s supported dictionary of feature > > > versions > > > > > will > > > > > > > be > > > > > > > > > > > defined > > > > > > > > > > > > in the broker code." So this means in order to > > restrict a > > > > > > certain > > > > > > > > > > > feature, > > > > > > > > > > > > we need to start the broker first and then send a > > feature > > > > > > gating > > > > > > > > > > request > > > > > > > > > > > > immediately, which introduces a time gap and the > > > > > > > intended-to-close > > > > > > > > > > > feature > > > > > > > > > > > > could actually serve request during this phase. Do > you > > > > think > > > > > we > > > > > > > > > should > > > > > > > > > > > also > > > > > > > > > > > > support configurations as well so that admin user > could > > > > > freely > > > > > > > roll > > > > > > > > > up > > > > > > > > > > a > > > > > > > > > > > > cluster with all nodes complying the same feature > > gating, > > > > > > without > > > > > > > > > > > worrying > > > > > > > > > > > > about the turnaround time to propagate the message > only > > > > after > > > > > > the > > > > > > > > > > cluster > > > > > > > > > > > > starts up? > > > > > > > > > > > > > > > > > > > > > > (Kowshik): This is a great point/question. One of the > > > > > > expectations > > > > > > > > out > > > > > > > > > of > > > > > > > > > > > this KIP, which is > > > > > > > > > > > already followed in the broker, is the following. > > > > > > > > > > > - Imagine at time T1 the broker starts up and > registers > > > it’s > > > > > > > > presence > > > > > > > > > in > > > > > > > > > > > ZK, > > > > > > > > > > > along with advertising it’s supported features. > > > > > > > > > > > - Imagine at a future time T2 the broker receives the > > > > > > > > > > > UpdateMetadataRequest > > > > > > > > > > > from the controller, which contains the latest > > finalized > > > > > > > features > > > > > > > > as > > > > > > > > > > > seen by > > > > > > > > > > > the controller. The broker validates this data > against > > > > it’s > > > > > > > > > supported > > > > > > > > > > > features to > > > > > > > > > > > make sure there is no mismatch (it will shutdown if > > > there > > > > is > > > > > > an > > > > > > > > > > > incompatibility). > > > > > > > > > > > > > > > > > > > > > > It is expected that during the time between the 2 > events > > T1 > > > > and > > > > > > T2, > > > > > > > > the > > > > > > > > > > > broker is > > > > > > > > > > > almost a silent entity in the cluster. It does not add > > any > > > > > value > > > > > > to > > > > > > > > the > > > > > > > > > > > cluster, or carry > > > > > > > > > > > out any important broker activities. By “important”, I > > mean > > > > it > > > > > is > > > > > > > not > > > > > > > > > > doing > > > > > > > > > > > mutations > > > > > > > > > > > on it’s persistence, not mutating critical in-memory > > state, > > > > > won’t > > > > > > > be > > > > > > > > > > > serving > > > > > > > > > > > produce/fetch requests. Note it doesn’t even know it’s > > > > assigned > > > > > > > > > > partitions > > > > > > > > > > > until > > > > > > > > > > > it receives UpdateMetadataRequest from controller. > > Anything > > > > the > > > > > > > > broker > > > > > > > > > is > > > > > > > > > > > doing up > > > > > > > > > > > until this point is not damaging/useful. > > > > > > > > > > > > > > > > > > > > > > I’ve clarified the above in the KIP, see this new > > section: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime > > > > > > > > > > > . > > > > > > > > > > > > > > > > > > > > > > > 5. "adding a new Feature, updating or deleting an > > > existing > > > > > > > > Feature", > > > > > > > > > > may > > > > > > > > > > > be > > > > > > > > > > > > I misunderstood something, I thought the features are > > > > defined > > > > > > in > > > > > > > > > broker > > > > > > > > > > > > code, so admin could not really create a new feature? > > > > > > > > > > > > > > > > > > > > > > (Kowshik): Great point! You understood this right. Here > > > > adding > > > > > a > > > > > > > > > feature > > > > > > > > > > > means we are > > > > > > > > > > > adding a cluster-wide finalized *max* version for a > > feature > > > > > that > > > > > > > was > > > > > > > > > > > previously never finalized. > > > > > > > > > > > I have clarified this in the KIP now. > > > > > > > > > > > > > > > > > > > > > > > 6. I think we need a separate error code like > > > > > > > > > > FEATURE_UPDATE_IN_PROGRESS > > > > > > > > > > > to > > > > > > > > > > > > reject a concurrent feature update request. > > > > > > > > > > > > > > > > > > > > > > (Kowshik): Great point! I have modified the KIP adding > > the > > > > > above > > > > > > > (see > > > > > > > > > > > 'Tooling support -> Admin API changes'). > > > > > > > > > > > > > > > > > > > > > > > 7. I think we haven't discussed the alternative > > solution > > > to > > > > > > pass > > > > > > > > the > > > > > > > > > > > > feature information through Zookeeper. Is that > > mentioned > > > in > > > > > the > > > > > > > KIP > > > > > > > > > to > > > > > > > > > > > > justify why using UpdateMetadata is more favorable? > > > > > > > > > > > > > > > > > > > > > > (Kowshik): Nice question! The broker reads finalized > > > feature > > > > > info > > > > > > > > > stored > > > > > > > > > > in > > > > > > > > > > > ZK, > > > > > > > > > > > only during startup when it does a validation. When > > serving > > > > > > > > > > > `ApiVersionsRequest`, the > > > > > > > > > > > broker does not read this info from ZK directly. I'd > > > imagine > > > > > the > > > > > > > risk > > > > > > > > > is > > > > > > > > > > > that it can increase > > > > > > > > > > > the ZK read QPS which can be a bottleneck for the > system. > > > > > Today, > > > > > > in > > > > > > > > > Kafka > > > > > > > > > > > we use the > > > > > > > > > > > controller to fan out ZK updates to brokers and we want > > to > > > > > stick > > > > > > to > > > > > > > > > that > > > > > > > > > > > pattern to avoid > > > > > > > > > > > the ZK read bottleneck when serving > `ApiVersionsRequest`. > > > > > > > > > > > > > > > > > > > > > > > 8. I was under the impression that user could > > configure a > > > > > range > > > > > > > of > > > > > > > > > > > > supported versions, what's the trade-off for allowing > > > > single > > > > > > > > > finalized > > > > > > > > > > > > version only? > > > > > > > > > > > > > > > > > > > > > > (Kowshik): Great question! The finalized version of a > > > feature > > > > > > > > basically > > > > > > > > > > > refers to > > > > > > > > > > > the cluster-wide finalized feature "maximum" version. > For > > > > > > example, > > > > > > > if > > > > > > > > > the > > > > > > > > > > > 'group_coordinator' feature > > > > > > > > > > > has the finalized version set to 10, then, it means > that > > > > > > > cluster-wide > > > > > > > > > all > > > > > > > > > > > versions upto v10 are > > > > > > > > > > > supported for this feature. However, note that if some > > > > version > > > > > > (ex: > > > > > > > > v0) > > > > > > > > > > > gets deprecated > > > > > > > > > > > for this feature, then we don’t convey that using this > > > scheme > > > > > > (also > > > > > > > > > > > supporting deprecation is a non-goal). > > > > > > > > > > > > > > > > > > > > > > (Kowshik): I’ve now modified the KIP at all points, > > > refering > > > > to > > > > > > > > > finalized > > > > > > > > > > > feature "maximum" versions. > > > > > > > > > > > > > > > > > > > > > > > 9. One minor syntax fix: Note that here the "client" > > here > > > > may > > > > > > be > > > > > > > a > > > > > > > > > > > producer > > > > > > > > > > > > > > > > > > > > > > (Kowshik): Great point! Done. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > Kowshik > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen < > > > > > > > > > reluctanthero...@gmail.com> > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > Hey Kowshik, > > > > > > > > > > > > > > > > > > > > > > > > thanks for the revised KIP. Got a couple of > questions: > > > > > > > > > > > > > > > > > > > > > > > > 1. "When is it safe for the brokers to begin handling > > EOS > > > > > > > traffic" > > > > > > > > > > could > > > > > > > > > > > be > > > > > > > > > > > > converted as "When is it safe for the brokers to > start > > > > > serving > > > > > > > new > > > > > > > > > > > > Exactly-Once(EOS) semantics" since EOS is not > explained > > > > > earlier > > > > > > > in > > > > > > > > > the > > > > > > > > > > > > context. > > > > > > > > > > > > > > > > > > > > > > > > 2. In the *Explanation *section, the metadata version > > > > number > > > > > > part > > > > > > > > > > seems a > > > > > > > > > > > > bit blurred. Could you point a reference to later > > section > > > > > that > > > > > > we > > > > > > > > > going > > > > > > > > > > > to > > > > > > > > > > > > store it in Zookeeper and update it every time when > > there > > > > is > > > > > a > > > > > > > > > feature > > > > > > > > > > > > change? > > > > > > > > > > > > > > > > > > > > > > > > 3. For the feature downgrade, although it's a > Non-goal > > of > > > > the > > > > > > > KIP, > > > > > > > > > for > > > > > > > > > > > > features such as group coordinator semantics, there > is > > no > > > > > legal > > > > > > > > > > scenario > > > > > > > > > > > to > > > > > > > > > > > > perform a downgrade at all. So having downgrade door > > open > > > > is > > > > > > > pretty > > > > > > > > > > > > error-prone as human faults happen all the time. I'm > > > > assuming > > > > > > as > > > > > > > > new > > > > > > > > > > > > features are implemented, it's not very hard to add a > > > flag > > > > > > during > > > > > > > > > > feature > > > > > > > > > > > > creation to indicate whether this feature is > > > > "downgradable". > > > > > > > Could > > > > > > > > > you > > > > > > > > > > > > explain a bit more on the extra engineering effort > for > > > > > shipping > > > > > > > > this > > > > > > > > > > KIP > > > > > > > > > > > > with downgrade protection in place? > > > > > > > > > > > > > > > > > > > > > > > > 4. "Each broker’s supported dictionary of feature > > > versions > > > > > will > > > > > > > be > > > > > > > > > > > defined > > > > > > > > > > > > in the broker code." So this means in order to > > restrict a > > > > > > certain > > > > > > > > > > > feature, > > > > > > > > > > > > we need to start the broker first and then send a > > feature > > > > > > gating > > > > > > > > > > request > > > > > > > > > > > > immediately, which introduces a time gap and the > > > > > > > intended-to-close > > > > > > > > > > > feature > > > > > > > > > > > > could actually serve request during this phase. Do > you > > > > think > > > > > we > > > > > > > > > should > > > > > > > > > > > also > > > > > > > > > > > > support configurations as well so that admin user > could > > > > > freely > > > > > > > roll > > > > > > > > > up > > > > > > > > > > a > > > > > > > > > > > > cluster with all nodes complying the same feature > > gating, > > > > > > without > > > > > > > > > > > worrying > > > > > > > > > > > > about the turnaround time to propagate the message > only > > > > after > > > > > > the > > > > > > > > > > cluster > > > > > > > > > > > > starts up? > > > > > > > > > > > > > > > > > > > > > > > > 5. "adding a new Feature, updating or deleting an > > > existing > > > > > > > > Feature", > > > > > > > > > > may > > > > > > > > > > > be > > > > > > > > > > > > I misunderstood something, I thought the features are > > > > defined > > > > > > in > > > > > > > > > broker > > > > > > > > > > > > code, so admin could not really create a new feature? > > > > > > > > > > > > > > > > > > > > > > > > 6. I think we need a separate error code like > > > > > > > > > > FEATURE_UPDATE_IN_PROGRESS > > > > > > > > > > > to > > > > > > > > > > > > reject a concurrent feature update request. > > > > > > > > > > > > > > > > > > > > > > > > 7. I think we haven't discussed the alternative > > solution > > > to > > > > > > pass > > > > > > > > the > > > > > > > > > > > > feature information through Zookeeper. Is that > > mentioned > > > in > > > > > the > > > > > > > KIP > > > > > > > > > to > > > > > > > > > > > > justify why using UpdateMetadata is more favorable? > > > > > > > > > > > > > > > > > > > > > > > > 8. I was under the impression that user could > > configure a > > > > > range > > > > > > > of > > > > > > > > > > > > supported versions, what's the trade-off for allowing > > > > single > > > > > > > > > finalized > > > > > > > > > > > > version only? > > > > > > > > > > > > > > > > > > > > > > > > 9. One minor syntax fix: Note that here the "client" > > here > > > > may > > > > > > be > > > > > > > a > > > > > > > > > > > producer > > > > > > > > > > > > > > > > > > > > > > > > Boyang > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe < > > > > > > cmcc...@apache.org > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam > > wrote: > > > > > > > > > > > > > > Hi Colin, > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the feedback! I've changed the KIP to > > > > address > > > > > > your > > > > > > > > > > > > > > suggestions. > > > > > > > > > > > > > > Please find below my explanation. Here is a link > to > > > KIP > > > > > > 584: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features > > > > > > > > > > > > > > . > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. '__data_version__' is the version of the > > finalized > > > > > > feature > > > > > > > > > > > metadata > > > > > > > > > > > > > > (i.e. actual ZK node contents), while the > > > > > > > '__schema_version__' > > > > > > > > is > > > > > > > > > > the > > > > > > > > > > > > > > version of the schema of the data persisted in > ZK. > > > > These > > > > > > > serve > > > > > > > > > > > > different > > > > > > > > > > > > > > purposes. '__data_version__' is is useful mainly > to > > > > > clients > > > > > > > > > during > > > > > > > > > > > > reads, > > > > > > > > > > > > > > to differentiate between the 2 versions of > > eventually > > > > > > > > consistent > > > > > > > > > > > > > 'finalized > > > > > > > > > > > > > > features' metadata (i.e. larger metadata version > is > > > > more > > > > > > > > recent). > > > > > > > > > > > > > > '__schema_version__' provides an additional > degree > > of > > > > > > > > > flexibility, > > > > > > > > > > > > where > > > > > > > > > > > > > if > > > > > > > > > > > > > > we decide to change the schema for '/features' > node > > > in > > > > ZK > > > > > > (in > > > > > > > > the > > > > > > > > > > > > > future), > > > > > > > > > > > > > > then we can manage broker roll outs suitably > (i.e. > > > > > > > > > > > > > > serialization/deserialization of the ZK data can > be > > > > > handled > > > > > > > > > > safely). > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Kowshik, > > > > > > > > > > > > > > > > > > > > > > > > > > If you're talking about a number that lets you know > > if > > > > data > > > > > > is > > > > > > > > more > > > > > > > > > > or > > > > > > > > > > > > > less recent, we would typically call that an epoch, > > and > > > > > not a > > > > > > > > > > version. > > > > > > > > > > > > For > > > > > > > > > > > > > the ZK data structures, the word "version" is > > typically > > > > > > > reserved > > > > > > > > > for > > > > > > > > > > > > > describing changes to the overall schema of the > data > > > that > > > > > is > > > > > > > > > written > > > > > > > > > > to > > > > > > > > > > > > > ZooKeeper. We don't even really change the > "version" > > > of > > > > > > those > > > > > > > > > > schemas > > > > > > > > > > > > that > > > > > > > > > > > > > much, since most changes are backwards-compatible. > > But > > > > we > > > > > do > > > > > > > > > include > > > > > > > > > > > > that > > > > > > > > > > > > > version field just in case. > > > > > > > > > > > > > > > > > > > > > > > > > > I don't think we really need an epoch here, though, > > > since > > > > > we > > > > > > > can > > > > > > > > > just > > > > > > > > > > > > look > > > > > > > > > > > > > at the broker epoch. Whenever the broker > registers, > > > its > > > > > > epoch > > > > > > > > will > > > > > > > > > > be > > > > > > > > > > > > > greater than the previous broker epoch. And the > > newly > > > > > > > registered > > > > > > > > > > data > > > > > > > > > > > > will > > > > > > > > > > > > > take priority. This will be a lot simpler than > > adding > > > a > > > > > > > separate > > > > > > > > > > epoch > > > > > > > > > > > > > system, I think. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. Regarding admin client needing min and max > > > > > information - > > > > > > > you > > > > > > > > > are > > > > > > > > > > > > > right! > > > > > > > > > > > > > > I've changed the KIP such that the Admin API also > > > > allows > > > > > > the > > > > > > > > user > > > > > > > > > > to > > > > > > > > > > > > read > > > > > > > > > > > > > > 'supported features' from a specific broker. > Please > > > > look > > > > > at > > > > > > > the > > > > > > > > > > > section > > > > > > > > > > > > > > "Admin API changes". > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3. Regarding the use of `long` vs `Long` - it was > > not > > > > > > > > deliberate. > > > > > > > > > > > I've > > > > > > > > > > > > > > improved the KIP to just use `long` at all > places. > > > > > > > > > > > > > > > > > > > > > > > > > > Sounds good. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 4. Regarding kafka.admin.FeatureCommand tool - > you > > > are > > > > > > right! > > > > > > > > > I've > > > > > > > > > > > > > updated > > > > > > > > > > > > > > the KIP sketching the functionality provided by > > this > > > > > tool, > > > > > > > with > > > > > > > > > > some > > > > > > > > > > > > > > examples. Please look at the section "Tooling > > support > > > > > > > > examples". > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, Kowshik. > > > > > > > > > > > > > > > > > > > > > > > > > > cheers, > > > > > > > > > > > > > Colin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > Kowshik > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe < > > > > > > > > > cmcc...@apache.org> > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, Kowshik, this looks good. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > In the "Schema" section, do we really need both > > > > > > > > > > __schema_version__ > > > > > > > > > > > > and > > > > > > > > > > > > > > > __data_version__? Can we just have a single > > > version > > > > > > field > > > > > > > > > here? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Shouldn't the Admin(Client) function have some > > way > > > to > > > > > get > > > > > > > the > > > > > > > > > min > > > > > > > > > > > and > > > > > > > > > > > > > max > > > > > > > > > > > > > > > information that we're exposing as well? I > guess > > > we > > > > > > could > > > > > > > > have > > > > > > > > > > > min, > > > > > > > > > > > > > max, > > > > > > > > > > > > > > > and current. Unrelated: is the use of Long > > rather > > > > than > > > > > > > long > > > > > > > > > > > > deliberate > > > > > > > > > > > > > > > here? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > It would be good to describe how the command > line > > > > tool > > > > > > > > > > > > > > > kafka.admin.FeatureCommand will work. For > > example > > > > the > > > > > > > flags > > > > > > > > > that > > > > > > > > > > > it > > > > > > > > > > > > > will > > > > > > > > > > > > > > > take and the output that it will generate to > > > STDOUT. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > cheers, > > > > > > > > > > > > > > > Colin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik > Prakasam > > > > wrote: > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I've opened KIP-584 > > > > > > > > > > <https://issues.apache.org/jira/browse/KIP-584> < > > > > > > > > > > > https://issues.apache.org/jira/browse/KIP-584 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > which > > > > > > > > > > > > > > > > is intended to provide a versioning scheme > for > > > > > > features. > > > > > > > > I'd > > > > > > > > > > like > > > > > > > > > > > > to > > > > > > > > > > > > > use > > > > > > > > > > > > > > > > this thread to discuss the same. I'd > appreciate > > > any > > > > > > > > feedback > > > > > > > > > on > > > > > > > > > > > > this. > > > > > > > > > > > > > > > > Here > > > > > > > > > > > > > > > > is a link to KIP-584 > > > > > > > > > > <https://issues.apache.org/jira/browse/KIP-584>: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features > > > > > > > > > > > > > > > > . > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > > > Kowshik > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >