Hi all, Thank you very much for all the insightful feedback! How do you feel about the KIP? Does the scope and the write up look OK to you, and is it time to call a vote?
Cheers, Kowshik On Wed, Apr 15, 2020 at 1:08 PM Kowshik Prakasam <kpraka...@confluent.io> wrote: > Hi Jun, > > Thank you for the suggestion! I have updated the KIP, please find my > response below. > > > 200. I guess you are saying only when the allowDowngrade field is set, > the > > finalized feature version can go backward. Otherwise, it can only go up. > > That makes sense. It would be useful to make that clear when explaining > > the usage of the allowDowngrade field. In the validation section, we > have " > > /features' from {"max_version_level": X} to {"max_version_level": X’}", > it > > seems that we need to mention Y there. > > (Kowshik): Great point! Yes, that is correct. Done, I have updated the > validations > section explaining the above. Here is a link to this section: > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Validations > > > Cheers, > Kowshik > > > > > On Wed, Apr 15, 2020 at 11:05 AM Jun Rao <j...@confluent.io> wrote: > >> Hi, Kowshik, >> >> 200. I guess you are saying only when the allowDowngrade field is set, the >> finalized feature version can go backward. Otherwise, it can only go up. >> That makes sense. It would be useful to make that clear when explaining >> the usage of the allowDowngrade field. In the validation section, we >> have " >> /features' from {"max_version_level": X} to {"max_version_level": X’}", it >> seems that we need to mention Y there. >> >> Thanks, >> >> Jun >> >> On Wed, Apr 15, 2020 at 10:44 AM Kowshik Prakasam <kpraka...@confluent.io >> > >> wrote: >> >> > Hi Jun, >> > >> > Great question! Please find my response below. >> > >> > > 200. My understanding is that If the CLI tool passes the >> > > '--allow-downgrade' flag when updating a specific feature, then a >> future >> > > downgrade is possible. Otherwise, the feature is now downgradable. If >> so, >> > I >> > > was wondering how the controller remembers this since it can be >> restarted >> > > over time? >> > >> > (Kowshik): The purpose of the flag was to just restrict the user intent >> for >> > a specific request. >> > It seems to me that to avoid confusion, I could call the flag as >> > `--try-downgrade` instead. >> > Then this makes it clear, that, the controller just has to consider the >> ask >> > from >> > the user as an explicit request to attempt a downgrade. >> > >> > The flag does not act as an override on controller's decision making >> that >> > decides whether >> > a flag is downgradable (these decisions on whether to allow a flag to be >> > downgraded >> > from a specific version level, can be embedded in the controller code). >> > >> > Please let me know what you think. >> > Sorry if I misunderstood the original question. >> > >> > >> > Cheers, >> > Kowshik >> > >> > >> > On Wed, Apr 15, 2020 at 9:40 AM Jun Rao <j...@confluent.io> wrote: >> > >> > > Hi, Kowshik, >> > > >> > > Thanks for the reply. Makes sense. Just one more question. >> > > >> > > 200. My understanding is that If the CLI tool passes the >> > > '--allow-downgrade' flag when updating a specific feature, then a >> future >> > > downgrade is possible. Otherwise, the feature is now downgradable. If >> > so, I >> > > was wondering how the controller remembers this since it can be >> restarted >> > > over time? >> > > >> > > Jun >> > > >> > > >> > > On Tue, Apr 14, 2020 at 6:49 PM Kowshik Prakasam < >> kpraka...@confluent.io >> > > >> > > wrote: >> > > >> > > > Hi Jun, >> > > > >> > > > Thanks a lot for the feedback and the questions! >> > > > Please find my response below. >> > > > >> > > > > 200. The UpdateFeaturesRequest includes an AllowDowngrade field. >> It >> > > seems >> > > > > that field needs to be persisted somewhere in ZK? >> > > > >> > > > (Kowshik): Great question! Below is my explanation. Please help me >> > > > understand, >> > > > if you feel there are cases where we would need to still persist it >> in >> > > ZK. >> > > > >> > > > Firstly I have updated my thoughts into the KIP now, under the >> > > 'guidelines' >> > > > section: >> > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guidelinesonfeatureversionsandworkflows >> > > > >> > > > The allowDowngrade boolean field is just to restrict the user >> intent, >> > and >> > > > to remind >> > > > them to double check their intent before proceeding. It should be >> set >> > to >> > > > true >> > > > by the user in a request, only when the user intent is to forcefully >> > > > "attempt" a >> > > > downgrade of a specific feature's max version level, to the provided >> > > value >> > > > in >> > > > the request. >> > > > >> > > > We can extend this safeguard. The controller (on it's end) can >> maintain >> > > > rules in the code, that, for safety reasons would outright reject >> > certain >> > > > downgrades >> > > > from a specific max_version_level for a specific feature. Such >> > rejections >> > > > may >> > > > happen depending on the feature being downgraded, and from what >> version >> > > > level. >> > > > >> > > > The CLI tool only allows a downgrade attempt in conjunction with >> > specific >> > > > flags and sub-commands. For example, in the CLI tool, if the user >> uses >> > > the >> > > > 'downgrade-all' command, or passes '--allow-downgrade' flag when >> > > updating a >> > > > specific feature, only then the tool will translate this ask to >> setting >> > > > 'allowDowngrade' field in the request to the server. >> > > > >> > > > > 201. UpdateFeaturesResponse has the following top level fields. >> > Should >> > > > > those fields be per feature? >> > > > > >> > > > > "fields": [ >> > > > > { "name": "ErrorCode", "type": "int16", "versions": "0+", >> > > > > "about": "The error code, or 0 if there was no error." }, >> > > > > { "name": "ErrorMessage", "type": "string", "versions": "0+", >> > > > > "about": "The error message, or null if there was no >> error." } >> > > > > ] >> > > > >> > > > (Kowshik): Great question! >> > > > As such, the API is transactional, as explained in the sections >> linked >> > > > below. >> > > > Either all provided FeatureUpdate was applied, or none. >> > > > It's the reason I felt we can have just one error code + message. >> > > > Happy to extend this if you feel otherwise. Please let me know. >> > > > >> > > > Link to sections: >> > > > >> > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-ChangestoKafkaController >> > > > >> > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guarantees >> > > > >> > > > > 202. The /features path in ZK has a field min_version_level. Which >> > API >> > > > and >> > > > > tool can change that value? >> > > > >> > > > (Kowshik): Great question! Currently this cannot be modified by >> using >> > the >> > > > API or the tool. >> > > > Feature version deprecation (by raising min_version_level) can be >> done >> > > only >> > > > by the Controller directly. The rationale is explained in this >> section: >> > > > >> > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Featureversiondeprecation >> > > > >> > > > >> > > > Cheers, >> > > > Kowshik >> > > > >> > > > On Tue, Apr 14, 2020 at 5:33 PM Jun Rao <j...@confluent.io> wrote: >> > > > >> > > > > Hi, Kowshik, >> > > > > >> > > > > Thanks for addressing those comments. Just a few more minor >> comments. >> > > > > >> > > > > 200. The UpdateFeaturesRequest includes an AllowDowngrade field. >> It >> > > seems >> > > > > that field needs to be persisted somewhere in ZK? >> > > > > >> > > > > 201. UpdateFeaturesResponse has the following top level fields. >> > Should >> > > > > those fields be per feature? >> > > > > >> > > > > "fields": [ >> > > > > { "name": "ErrorCode", "type": "int16", "versions": "0+", >> > > > > "about": "The error code, or 0 if there was no error." }, >> > > > > { "name": "ErrorMessage", "type": "string", "versions": "0+", >> > > > > "about": "The error message, or null if there was no >> error." } >> > > > > ] >> > > > > >> > > > > 202. The /features path in ZK has a field min_version_level. Which >> > API >> > > > and >> > > > > tool can change that value? >> > > > > >> > > > > Jun >> > > > > >> > > > > On Mon, Apr 13, 2020 at 5:12 PM Kowshik Prakasam < >> > > kpraka...@confluent.io >> > > > > >> > > > > wrote: >> > > > > >> > > > > > Hi Jun, >> > > > > > >> > > > > > Thanks for the feedback! I have updated the KIP-584 addressing >> your >> > > > > > comments. >> > > > > > Please find my response below. >> > > > > > >> > > > > > > 100.6 You can look for the sentence "This operation requires >> > ALTER >> > > on >> > > > > > > CLUSTER." in KIP-455. Also, you can check its usage in >> > > > > > > KafkaApis.authorize(). >> > > > > > >> > > > > > (Kowshik): Done. Great point! For the newly introduced >> > > UPDATE_FEATURES >> > > > > api, >> > > > > > I have added a >> > > > > > requirement that AclOperation.ALTER is required on >> > > > ResourceType.CLUSTER. >> > > > > > >> > > > > > > 110. Keeping the feature version as int is probably fine. I >> just >> > > felt >> > > > > > that >> > > > > > > for some of the common user interactions, it's more >> convenient to >> > > > > > > relate that to a release version. For example, if a user >> wants to >> > > > > > downgrade >> > > > > > > to a release 2.5, it's easier for the user to use the tool >> like >> > > "tool >> > > > > > > --downgrade 2.5" instead of "tool --downgrade --feature X >> > --version >> > > > 6". >> > > > > > >> > > > > > (Kowshik): Great point. Generally, maximum feature version >> levels >> > are >> > > > not >> > > > > > downgradable after >> > > > > > they are finalized in the cluster. This is because, as a >> guideline >> > > > > bumping >> > > > > > feature version level usually is used mainly to convey important >> > > > breaking >> > > > > > changes. >> > > > > > Despite the above, there may be some extreme/rare cases where a >> > user >> > > > > wants >> > > > > > to downgrade >> > > > > > all features to a specific previous release. The user may want >> to >> > do >> > > > this >> > > > > > just >> > > > > > prior to rolling back a Kafka cluster to a previous release. >> > > > > > >> > > > > > To support the above, I have made a change to the KIP explaining >> > that >> > > > the >> > > > > > CLI tool is versioned. >> > > > > > The CLI tool internally has knowledge about a map of features to >> > > their >> > > > > > respective max >> > > > > > versions supported by the Broker. The tool's knowledge of >> features >> > > and >> > > > > > their version values, >> > > > > > is limited to the version of the CLI tool itself i.e. the >> > information >> > > > is >> > > > > > packaged into the CLI tool >> > > > > > when it is released. Whenever a Kafka release introduces a new >> > > feature >> > > > > > version, or modifies >> > > > > > an existing feature version, the CLI tool shall also be updated >> > with >> > > > this >> > > > > > information, >> > > > > > Newer versions of the CLI tool will be released as part of the >> > Kafka >> > > > > > releases. >> > > > > > >> > > > > > Therefore, to achieve the downgrade need, the user just needs to >> > run >> > > > the >> > > > > > version of >> > > > > > the CLI tool that's part of the particular previous release that >> > > he/she >> > > > > is >> > > > > > downgrading to. >> > > > > > To help the user with this, there is a new command added to the >> CLI >> > > > tool >> > > > > > called `downgrade-all`. >> > > > > > This essentially downgrades max version levels of all features >> in >> > the >> > > > > > cluster to the versions >> > > > > > known to the CLI tool internally. >> > > > > > >> > > > > > I have explained the above in the KIP under these sections: >> > > > > > >> > > > > > Tooling support (have explained that the CLI tool is versioned): >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport >> > > > > > >> > > > > > Regular CLI tool usage (please refer to point #3, and see the >> > tooling >> > > > > > example) >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-RegularCLItoolusage >> > > > > > >> > > > > > > 110. Similarly, if the client library finds a feature mismatch >> > with >> > > > the >> > > > > > broker, >> > > > > > > the client likely needs to log some error message for the >> user to >> > > > take >> > > > > > some >> > > > > > > actions. It's much more actionable if the error message is >> > "upgrade >> > > > the >> > > > > > > broker to release version 2.6" than just "upgrade the broker >> to >> > > > feature >> > > > > > > version 7". >> > > > > > >> > > > > > (Kowshik): That's a really good point! If we use ints for >> feature >> > > > > versions, >> > > > > > the best >> > > > > > message that client can print for debugging is "broker doesn't >> > > support >> > > > > > feature version 7", and alongside that print the supported >> version >> > > > range >> > > > > > returned >> > > > > > by the broker. Then, does it sound reasonable that the user >> could >> > > then >> > > > > > reference >> > > > > > Kafka release logs to figure out which version of the broker >> > release >> > > is >> > > > > > required >> > > > > > be deployed, to support feature version 7? I couldn't think of a >> > > better >> > > > > > strategy here. >> > > > > > >> > > > > > > 120. When should a developer bump up the version of a feature? >> > > > > > >> > > > > > (Kowshik): Great question! In the KIP, I have added a section: >> > > > > 'Guidelines >> > > > > > on feature versions and workflows' >> > > > > > providing some guidelines on when to use the versioned feature >> > flags, >> > > > and >> > > > > > what >> > > > > > are the regular workflows with the CLI tool. >> > > > > > >> > > > > > Link to the relevant sections: >> > > > > > Guidelines: >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guidelinesonfeatureversionsandworkflows >> > > > > > >> > > > > > Regular CLI tool usage: >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-RegularCLItoolusage >> > > > > > >> > > > > > Advanced CLI tool usage: >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-AdvancedCLItoolusage >> > > > > > >> > > > > > >> > > > > > Cheers, >> > > > > > Kowshik >> > > > > > >> > > > > > >> > > > > > On Fri, Apr 10, 2020 at 4:25 PM Jun Rao <j...@confluent.io> >> wrote: >> > > > > > >> > > > > > > Hi, Kowshik, >> > > > > > > >> > > > > > > Thanks for the reply. A few more comments. >> > > > > > > >> > > > > > > 110. Keeping the feature version as int is probably fine. I >> just >> > > felt >> > > > > > that >> > > > > > > for some of the common user interactions, it's more >> convenient to >> > > > > > > relate that to a release version. For example, if a user >> wants to >> > > > > > downgrade >> > > > > > > to a release 2.5, it's easier for the user to use the tool >> like >> > > "tool >> > > > > > > --downgrade 2.5" instead of "tool --downgrade --feature X >> > --version >> > > > 6". >> > > > > > > Similarly, if the client library finds a feature mismatch with >> > the >> > > > > > broker, >> > > > > > > the client likely needs to log some error message for the >> user to >> > > > take >> > > > > > some >> > > > > > > actions. It's much more actionable if the error message is >> > "upgrade >> > > > the >> > > > > > > broker to release version 2.6" than just "upgrade the broker >> to >> > > > feature >> > > > > > > version 7". >> > > > > > > >> > > > > > > 111. Sounds good. >> > > > > > > >> > > > > > > 120. When should a developer bump up the version of a feature? >> > > > > > > >> > > > > > > Jun >> > > > > > > >> > > > > > > On Tue, Apr 7, 2020 at 12:26 AM Kowshik Prakasam < >> > > > > kpraka...@confluent.io >> > > > > > > >> > > > > > > wrote: >> > > > > > > >> > > > > > > > Hi Jun, >> > > > > > > > >> > > > > > > > I have updated the KIP for the item 111. >> > > > > > > > I'm in the process of addressing 100.6, and will provide an >> > > update >> > > > > > soon. >> > > > > > > > I think item 110 is still under discussion given we are now >> > > > > providing a >> > > > > > > way >> > > > > > > > to finalize >> > > > > > > > all features to their latest version levels. In any case, >> > please >> > > > let >> > > > > us >> > > > > > > > know >> > > > > > > > how you feel in response to Colin's comments on this topic. >> > > > > > > > >> > > > > > > > > 111. To put this in context, when we had IBP, the default >> > value >> > > > is >> > > > > > the >> > > > > > > > > current released version. So, if you are a brand new user, >> > you >> > > > > don't >> > > > > > > need >> > > > > > > > > to configure IBP and all new features will be immediately >> > > > available >> > > > > > in >> > > > > > > > the >> > > > > > > > > new cluster. If you are upgrading from an old version, >> you do >> > > > need >> > > > > to >> > > > > > > > > understand and configure IBP. I see a similar pattern here >> > for >> > > > > > > > > features. From the ease of use perspective, ideally, we >> > > shouldn't >> > > > > > > require >> > > > > > > > a >> > > > > > > > > new user to have an extra step such as running a bootstrap >> > > script >> > > > > > > unless >> > > > > > > > > it's truly necessary. If someone has a special need (all >> the >> > > > cases >> > > > > > you >> > > > > > > > > mentioned seem special cases?), they can configure a mode >> > such >> > > > that >> > > > > > > > > features are enabled/disabled manually. >> > > > > > > > >> > > > > > > > (Kowshik): That makes sense, thanks for the idea! Sorry if I >> > > didn't >> > > > > > > > understand >> > > > > > > > this need earlier. I have updated the KIP with the approach >> > that >> > > > > > whenever >> > > > > > > > the '/features' node is absent, the controller by default >> will >> > > > > > bootstrap >> > > > > > > > the node >> > > > > > > > to contain the latest feature levels. Here is the new >> section >> > in >> > > > the >> > > > > > KIP >> > > > > > > > describing >> > > > > > > > the same: >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Controller:ZKnodebootstrapwithdefaultvalues >> > > > > > > > >> > > > > > > > Next, as I explained in my response to Colin's suggestions, >> we >> > > are >> > > > > now >> > > > > > > > providing a `--finalize-latest-features` flag with the >> tooling. >> > > > This >> > > > > > lets >> > > > > > > > the sysadmin finalize all features known to the controller >> to >> > > their >> > > > > > > latest >> > > > > > > > version >> > > > > > > > levels. Please look at this section (point #3 and the >> tooling >> > > > example >> > > > > > > > later): >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport >> > > > > > > > >> > > > > > > > >> > > > > > > > Do you feel this addresses your comment/concern? >> > > > > > > > >> > > > > > > > >> > > > > > > > Cheers, >> > > > > > > > Kowshik >> > > > > > > > >> > > > > > > > On Mon, Apr 6, 2020 at 12:06 PM Jun Rao <j...@confluent.io> >> > > wrote: >> > > > > > > > >> > > > > > > > > Hi, Kowshik, >> > > > > > > > > >> > > > > > > > > Thanks for the reply. A few more replies below. >> > > > > > > > > >> > > > > > > > > 100.6 You can look for the sentence "This operation >> requires >> > > > ALTER >> > > > > on >> > > > > > > > > CLUSTER." in KIP-455. Also, you can check its usage in >> > > > > > > > > KafkaApis.authorize(). >> > > > > > > > > >> > > > > > > > > 110. From the external client/tooling perspective, it's >> more >> > > > > natural >> > > > > > to >> > > > > > > > use >> > > > > > > > > the release version for features. If we can use the same >> > > release >> > > > > > > version >> > > > > > > > > for internal representation, it seems simpler (easier to >> > > > > understand, >> > > > > > no >> > > > > > > > > mapping overhead, etc). Is there a benefit with separate >> > > external >> > > > > and >> > > > > > > > > internal versioning schemes? >> > > > > > > > > >> > > > > > > > > 111. To put this in context, when we had IBP, the default >> > value >> > > > is >> > > > > > the >> > > > > > > > > current released version. So, if you are a brand new user, >> > you >> > > > > don't >> > > > > > > need >> > > > > > > > > to configure IBP and all new features will be immediately >> > > > available >> > > > > > in >> > > > > > > > the >> > > > > > > > > new cluster. If you are upgrading from an old version, >> you do >> > > > need >> > > > > to >> > > > > > > > > understand and configure IBP. I see a similar pattern here >> > for >> > > > > > > > > features. From the ease of use perspective, ideally, we >> > > shouldn't >> > > > > > > > require a >> > > > > > > > > new user to have an extra step such as running a bootstrap >> > > script >> > > > > > > unless >> > > > > > > > > it's truly necessary. If someone has a special need (all >> the >> > > > cases >> > > > > > you >> > > > > > > > > mentioned seem special cases?), they can configure a mode >> > such >> > > > that >> > > > > > > > > features are enabled/disabled manually. >> > > > > > > > > >> > > > > > > > > Jun >> > > > > > > > > >> > > > > > > > > On Fri, Apr 3, 2020 at 5:54 PM Kowshik Prakasam < >> > > > > > > kpraka...@confluent.io> >> > > > > > > > > wrote: >> > > > > > > > > >> > > > > > > > > > Hi Jun, >> > > > > > > > > > >> > > > > > > > > > Thanks for the feedback and suggestions. Please find my >> > > > response >> > > > > > > below. >> > > > > > > > > > >> > > > > > > > > > > 100.6 For every new request, the admin needs to >> control >> > who >> > > > is >> > > > > > > > allowed >> > > > > > > > > to >> > > > > > > > > > > issue that request if security is enabled. So, we >> need to >> > > > > assign >> > > > > > > the >> > > > > > > > > new >> > > > > > > > > > > request a ResourceType and possible AclOperations. See >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment >> > > > > > > > > > > as an example. >> > > > > > > > > > >> > > > > > > > > > (Kowshik): I don't see any reference to the words >> > > ResourceType >> > > > or >> > > > > > > > > > AclOperations >> > > > > > > > > > in the KIP. Please let me know how I can use the KIP >> that >> > you >> > > > > > linked >> > > > > > > to >> > > > > > > > > > know how to >> > > > > > > > > > setup the appropriate ResourceType and/or >> ClusterOperation? >> > > > > > > > > > >> > > > > > > > > > > 105. If we change delete to disable, it's better to do >> > this >> > > > > > > > > consistently >> > > > > > > > > > in >> > > > > > > > > > > request protocol and admin api as well. >> > > > > > > > > > >> > > > > > > > > > (Kowshik): The API shouldn't be called 'disable' when >> it is >> > > > > > deleting >> > > > > > > a >> > > > > > > > > > feature. >> > > > > > > > > > I've just changed the KIP to use 'delete'. I don't have >> a >> > > > strong >> > > > > > > > > > preference. >> > > > > > > > > > >> > > > > > > > > > > 110. The minVersion/maxVersion for features use int64. >> > > > > Currently, >> > > > > > > our >> > > > > > > > > > > release version schema is major.minor.bugfix (e.g. >> > 2.5.0). >> > > > It's >> > > > > > > > > possible >> > > > > > > > > > > for new features to be included in minor releases too. >> > > Should >> > > > > we >> > > > > > > make >> > > > > > > > > the >> > > > > > > > > > > feature versioning match the release versioning? >> > > > > > > > > > >> > > > > > > > > > (Kowshik): The release version can be mapped to a set of >> > > > feature >> > > > > > > > > versions, >> > > > > > > > > > and this can be done, for example in the tool (or even >> > > external >> > > > > to >> > > > > > > the >> > > > > > > > > > tool). >> > > > > > > > > > Can you please clarify what I'm missing? >> > > > > > > > > > >> > > > > > > > > > > 111. "During regular operations, the data in the ZK >> node >> > > can >> > > > be >> > > > > > > > mutated >> > > > > > > > > > > only via a specific admin API served only by the >> > > > controller." I >> > > > > > am >> > > > > > > > > > > wondering why can't the controller auto finalize a >> > feature >> > > > > > version >> > > > > > > > > after >> > > > > > > > > > > all brokers are upgraded? For new users who download >> the >> > > > latest >> > > > > > > > version >> > > > > > > > > > to >> > > > > > > > > > > build a new cluster, it's inconvenient for them to >> have >> > to >> > > > > > manually >> > > > > > > > > > enable >> > > > > > > > > > > each feature. >> > > > > > > > > > >> > > > > > > > > > (Kowshik): I agree that there is a trade-off here, but >> it >> > > will >> > > > > help >> > > > > > > > > > to decide whether the automation can be thought through >> in >> > > the >> > > > > > future >> > > > > > > > > > in a follow up KIP, or right now in this KIP. We may >> invest >> > > > > > > > > > in automation, but we have to decide whether we should >> do >> > it >> > > > > > > > > > now or later. >> > > > > > > > > > >> > > > > > > > > > For the inconvenience that you mentioned, do you think >> the >> > > > > problem >> > > > > > > that >> > > > > > > > > you >> > > > > > > > > > mentioned can be overcome by asking for the cluster >> > operator >> > > > to >> > > > > > run >> > > > > > > a >> > > > > > > > > > bootstrap script when he/she knows that a specific AK >> > > release >> > > > > has >> > > > > > > been >> > > > > > > > > > almost completely deployed in a cluster for the first >> time? >> > > > Idea >> > > > > is >> > > > > > > > that >> > > > > > > > > > the >> > > > > > > > > > bootstrap script will know how to map a specific AK >> release >> > > to >> > > > > > > > finalized >> > > > > > > > > > feature versions, and run the `kafka-features.sh` tool >> > > > > > appropriately >> > > > > > > > > > against >> > > > > > > > > > the cluster. >> > > > > > > > > > >> > > > > > > > > > Now, coming back to your automation proposal/question. >> > > > > > > > > > I do see the value of automated feature version >> > finalization, >> > > > > but I >> > > > > > > > also >> > > > > > > > > > see >> > > > > > > > > > that this will open up several questions and some >> risks, as >> > > > > > explained >> > > > > > > > > > below. >> > > > > > > > > > The answers to these depend on the definition of the >> > > automation >> > > > > we >> > > > > > > > choose >> > > > > > > > > > to build, and how well does it fit into a kafka >> deployment. >> > > > > > > > > > Basically, it can be unsafe for the controller to >> finalize >> > > > > feature >> > > > > > > > > version >> > > > > > > > > > upgrades automatically, without learning about the >> intent >> > of >> > > > the >> > > > > > > > cluster >> > > > > > > > > > operator. >> > > > > > > > > > 1. We would sometimes want to lock feature versions only >> > when >> > > > we >> > > > > > have >> > > > > > > > > > externally verified >> > > > > > > > > > the stability of the broker binary. >> > > > > > > > > > 2. Sometimes only the cluster operator knows that a >> cluster >> > > > > upgrade >> > > > > > > is >> > > > > > > > > > complete, >> > > > > > > > > > and new brokers are highly unlikely to join the cluster. >> > > > > > > > > > 3. Only the cluster operator knows that the intent is to >> > > deploy >> > > > > the >> > > > > > > > same >> > > > > > > > > > version >> > > > > > > > > > of the new broker release across the entire cluster >> (i.e. >> > the >> > > > > > latest >> > > > > > > > > > downloaded version). >> > > > > > > > > > 4. For downgrades, it appears the controller still needs >> > some >> > > > > > > external >> > > > > > > > > > input >> > > > > > > > > > (such as the proposed tool) to finalize a feature >> version >> > > > > > downgrade. >> > > > > > > > > > >> > > > > > > > > > If we have automation, that automation can end up >> failing >> > in >> > > > some >> > > > > > of >> > > > > > > > the >> > > > > > > > > > cases >> > > > > > > > > > above. Then, we need a way to declare that the cluster >> is >> > > "not >> > > > > > ready" >> > > > > > > > if >> > > > > > > > > > the >> > > > > > > > > > controller cannot automatically finalize some basic >> > required >> > > > > > feature >> > > > > > > > > > version >> > > > > > > > > > upgrades across the cluster. We need to make the cluster >> > > > operator >> > > > > > > aware >> > > > > > > > > in >> > > > > > > > > > such a scenario (raise an alert or alike). >> > > > > > > > > > >> > > > > > > > > > > 112. DeleteFeaturesResponse: It seems the apiKey >> should >> > be >> > > 49 >> > > > > > > instead >> > > > > > > > > of >> > > > > > > > > > 48. >> > > > > > > > > > >> > > > > > > > > > (Kowshik): Done. >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > Cheers, >> > > > > > > > > > Kowshik >> > > > > > > > > > >> > > > > > > > > > On Fri, Apr 3, 2020 at 11:24 AM Jun Rao < >> j...@confluent.io> >> > > > > wrote: >> > > > > > > > > > >> > > > > > > > > > > Hi, Kowshik, >> > > > > > > > > > > >> > > > > > > > > > > Thanks for the reply. A few more comments below. >> > > > > > > > > > > >> > > > > > > > > > > 100.6 For every new request, the admin needs to >> control >> > who >> > > > is >> > > > > > > > allowed >> > > > > > > > > to >> > > > > > > > > > > issue that request if security is enabled. So, we >> need to >> > > > > assign >> > > > > > > the >> > > > > > > > > new >> > > > > > > > > > > request a ResourceType and possible AclOperations. See >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment >> > > > > > > > > > > as >> > > > > > > > > > > an example. >> > > > > > > > > > > >> > > > > > > > > > > 105. If we change delete to disable, it's better to do >> > this >> > > > > > > > > consistently >> > > > > > > > > > in >> > > > > > > > > > > request protocol and admin api as well. >> > > > > > > > > > > >> > > > > > > > > > > 110. The minVersion/maxVersion for features use int64. >> > > > > Currently, >> > > > > > > our >> > > > > > > > > > > release version schema is major.minor.bugfix (e.g. >> > 2.5.0). >> > > > It's >> > > > > > > > > possible >> > > > > > > > > > > for new features to be included in minor releases too. >> > > Should >> > > > > we >> > > > > > > make >> > > > > > > > > the >> > > > > > > > > > > feature versioning match the release versioning? >> > > > > > > > > > > >> > > > > > > > > > > 111. "During regular operations, the data in the ZK >> node >> > > can >> > > > be >> > > > > > > > mutated >> > > > > > > > > > > only via a specific admin API served only by the >> > > > controller." I >> > > > > > am >> > > > > > > > > > > wondering why can't the controller auto finalize a >> > feature >> > > > > > version >> > > > > > > > > after >> > > > > > > > > > > all brokers are upgraded? For new users who download >> the >> > > > latest >> > > > > > > > version >> > > > > > > > > > to >> > > > > > > > > > > build a new cluster, it's inconvenient for them to >> have >> > to >> > > > > > manually >> > > > > > > > > > enable >> > > > > > > > > > > each feature. >> > > > > > > > > > > >> > > > > > > > > > > 112. DeleteFeaturesResponse: It seems the apiKey >> should >> > be >> > > 49 >> > > > > > > instead >> > > > > > > > > of >> > > > > > > > > > > 48. >> > > > > > > > > > > >> > > > > > > > > > > Jun >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam < >> > > > > > > > > kpraka...@confluent.io> >> > > > > > > > > > > wrote: >> > > > > > > > > > > >> > > > > > > > > > > > Hey Jun, >> > > > > > > > > > > > >> > > > > > > > > > > > Thanks a lot for the great feedback! Please note >> that >> > the >> > > > > > design >> > > > > > > > > > > > has changed a little bit on the KIP, and we now >> > propagate >> > > > the >> > > > > > > > > finalized >> > > > > > > > > > > > features metadata only via ZK watches (instead of >> > > > > > > > > UpdateMetadataRequest >> > > > > > > > > > > > from the controller). >> > > > > > > > > > > > >> > > > > > > > > > > > Please find below my response to your >> > questions/feedback, >> > > > > with >> > > > > > > the >> > > > > > > > > > prefix >> > > > > > > > > > > > "(Kowshik):". >> > > > > > > > > > > > >> > > > > > > > > > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse >> > > > > > > > > > > > > 100.1 Since this request waits for responses from >> > > > brokers, >> > > > > > > should >> > > > > > > > > we >> > > > > > > > > > > add >> > > > > > > > > > > > a >> > > > > > > > > > > > > timeout in the request (like createTopicRequest)? >> > > > > > > > > > > > >> > > > > > > > > > > > (Kowshik): Great point! Done. I have added a timeout >> > > field. >> > > > > > Note: >> > > > > > > > we >> > > > > > > > > no >> > > > > > > > > > > > longer >> > > > > > > > > > > > wait for responses from brokers, since the design >> has >> > > been >> > > > > > > changed >> > > > > > > > so >> > > > > > > > > > > that >> > > > > > > > > > > > the >> > > > > > > > > > > > features information is propagated via ZK. >> > Nevertheless, >> > > it >> > > > > is >> > > > > > > > right >> > > > > > > > > to >> > > > > > > > > > > > have a timeout >> > > > > > > > > > > > for the request. >> > > > > > > > > > > > >> > > > > > > > > > > > > 100.2 The response schema is a bit weird. >> Typically, >> > > the >> > > > > > > response >> > > > > > > > > > just >> > > > > > > > > > > > > shows an error code and an error message, instead >> of >> > > > > echoing >> > > > > > > the >> > > > > > > > > > > request. >> > > > > > > > > > > > >> > > > > > > > > > > > (Kowshik): Great point! Yeah, I have modified it to >> > just >> > > > > return >> > > > > > > an >> > > > > > > > > > error >> > > > > > > > > > > > code and a message. >> > > > > > > > > > > > Previously it was not echoing the "request", rather >> it >> > > was >> > > > > > > > returning >> > > > > > > > > > the >> > > > > > > > > > > > latest set of >> > > > > > > > > > > > cluster-wide finalized features (after applying the >> > > > updates). >> > > > > > But >> > > > > > > > you >> > > > > > > > > > are >> > > > > > > > > > > > right, >> > > > > > > > > > > > the additional info is not required, so I have >> removed >> > it >> > > > > from >> > > > > > > the >> > > > > > > > > > > response >> > > > > > > > > > > > schema. >> > > > > > > > > > > > >> > > > > > > > > > > > > 100.3 Should we add a separate request to >> > list/describe >> > > > the >> > > > > > > > > existing >> > > > > > > > > > > > > features? >> > > > > > > > > > > > >> > > > > > > > > > > > (Kowshik): This is already present in the KIP via >> the >> > > > > > > > > > 'DescribeFeatures' >> > > > > > > > > > > > Admin API, >> > > > > > > > > > > > which, underneath covers uses the >> ApiVersionsRequest to >> > > > > > > > list/describe >> > > > > > > > > > the >> > > > > > > > > > > > existing features. Please read the 'Tooling support' >> > > > section. >> > > > > > > > > > > > >> > > > > > > > > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a >> > > single >> > > > > > > request. >> > > > > > > > > For >> > > > > > > > > > > > > DELETE, the version field doesn't make sense. So, >> I >> > > guess >> > > > > the >> > > > > > > > > broker >> > > > > > > > > > > just >> > > > > > > > > > > > > ignores this? An alternative way is to have a >> > separate >> > > > > > > > > > > > DeleteFeaturesRequest >> > > > > > > > > > > > >> > > > > > > > > > > > (Kowshik): Great point! I have modified the KIP now >> to >> > > > have 2 >> > > > > > > > > separate >> > > > > > > > > > > > controller APIs >> > > > > > > > > > > > serving these different purposes: >> > > > > > > > > > > > 1. updateFeatures >> > > > > > > > > > > > 2. deleteFeatures >> > > > > > > > > > > > >> > > > > > > > > > > > > 100.5 In UpdateFeaturesResponse, we have "The >> > > > monotonically >> > > > > > > > > > increasing >> > > > > > > > > > > > > version of the metadata for finalized features." >> I am >> > > > > > wondering >> > > > > > > > why >> > > > > > > > > > the >> > > > > > > > > > > > > ordering is important? >> > > > > > > > > > > > >> > > > > > > > > > > > (Kowshik): In the latest KIP write-up, it is called >> > epoch >> > > > > > > (instead >> > > > > > > > of >> > > > > > > > > > > > version), and >> > > > > > > > > > > > it is just the ZK node version. Basically, this is >> the >> > > > epoch >> > > > > > for >> > > > > > > > the >> > > > > > > > > > > > cluster-wide >> > > > > > > > > > > > finalized feature version metadata. This metadata is >> > > served >> > > > > to >> > > > > > > > > clients >> > > > > > > > > > > via >> > > > > > > > > > > > the >> > > > > > > > > > > > ApiVersionsResponse (for reads). We propagate >> updates >> > > from >> > > > > the >> > > > > > > > > > > '/features' >> > > > > > > > > > > > ZK node >> > > > > > > > > > > > to all brokers, via ZK watches setup by each broker >> on >> > > the >> > > > > > > > > '/features' >> > > > > > > > > > > > node. >> > > > > > > > > > > > >> > > > > > > > > > > > Now here is why the ordering is important: >> > > > > > > > > > > > ZK watches don't propagate at the same time. As a >> > result, >> > > > the >> > > > > > > > > > > > ApiVersionsResponse >> > > > > > > > > > > > is eventually consistent across brokers. This can >> > > introduce >> > > > > > cases >> > > > > > > > > > > > where clients see an older lower epoch of the >> features >> > > > > > metadata, >> > > > > > > > > after >> > > > > > > > > > a >> > > > > > > > > > > > more recent >> > > > > > > > > > > > higher epoch was returned at a previous point in >> time. >> > We >> > > > > > expect >> > > > > > > > > > clients >> > > > > > > > > > > > to always employ the rule that the latest received >> > higher >> > > > > epoch >> > > > > > > of >> > > > > > > > > > > metadata >> > > > > > > > > > > > always trumps an older smaller epoch. Those clients >> > that >> > > > are >> > > > > > > > external >> > > > > > > > > > to >> > > > > > > > > > > > Kafka should strongly consider discovering the >> latest >> > > > > metadata >> > > > > > > once >> > > > > > > > > > > during >> > > > > > > > > > > > startup from the brokers, and if required refresh >> the >> > > > > metadata >> > > > > > > > > > > periodically >> > > > > > > > > > > > (to get the latest metadata). >> > > > > > > > > > > > >> > > > > > > > > > > > > 100.6 Could you specify the required ACL for this >> new >> > > > > > request? >> > > > > > > > > > > > >> > > > > > > > > > > > (Kowshik): What is ACL, and how could I find out >> which >> > > one >> > > > to >> > > > > > > > > specify? >> > > > > > > > > > > > Please could you provide me some pointers? I'll be >> glad >> > > to >> > > > > > update >> > > > > > > > the >> > > > > > > > > > > > KIP once I know the next steps. >> > > > > > > > > > > > >> > > > > > > > > > > > > 101. For the broker registration ZK node, should >> we >> > > bump >> > > > up >> > > > > > the >> > > > > > > > > > version >> > > > > > > > > > > > in >> > > > > > > > > > > > the json? >> > > > > > > > > > > > >> > > > > > > > > > > > (Kowshik): Great point! Done. I've increased the >> > version >> > > in >> > > > > the >> > > > > > > > > broker >> > > > > > > > > > > json >> > > > > > > > > > > > by 1. >> > > > > > > > > > > > >> > > > > > > > > > > > > 102. For the /features ZK node, not sure if we >> need >> > the >> > > > > epoch >> > > > > > > > > field. >> > > > > > > > > > > Each >> > > > > > > > > > > > > ZK node has an internal version field that is >> > > incremented >> > > > > on >> > > > > > > > every >> > > > > > > > > > > > update. >> > > > > > > > > > > > >> > > > > > > > > > > > (Kowshik): Great point! Done. I'm using the ZK node >> > > version >> > > > > > now, >> > > > > > > > > > instead >> > > > > > > > > > > of >> > > > > > > > > > > > explicitly >> > > > > > > > > > > > incremented epoch. >> > > > > > > > > > > > >> > > > > > > > > > > > > 103. "Enabling the actual semantics of a feature >> > > version >> > > > > > > > > cluster-wide >> > > > > > > > > > > is >> > > > > > > > > > > > > left to the discretion of the logic implementing >> the >> > > > > feature >> > > > > > > (ex: >> > > > > > > > > can >> > > > > > > > > > > be >> > > > > > > > > > > > > done via dynamic broker config)." Does that mean >> the >> > > > broker >> > > > > > > > > > > registration >> > > > > > > > > > > > ZK >> > > > > > > > > > > > > node will be updated dynamically when this >> happens? >> > > > > > > > > > > > >> > > > > > > > > > > > (Kowshik): Not really. The text was just conveying >> > that a >> > > > > > broker >> > > > > > > > > could >> > > > > > > > > > > > "know" of >> > > > > > > > > > > > a new feature version, but it does not mean the >> broker >> > > > should >> > > > > > > have >> > > > > > > > > also >> > > > > > > > > > > > activated the effects of the feature version. >> Knowing >> > vs >> > > > > > > activation >> > > > > > > > > > are 2 >> > > > > > > > > > > > separate things, >> > > > > > > > > > > > and the latter can be achieved by dynamic config. I >> > have >> > > > > > reworded >> > > > > > > > the >> > > > > > > > > > > text >> > > > > > > > > > > > to >> > > > > > > > > > > > make this clear to the reader. >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > > 104. UpdateMetadataRequest >> > > > > > > > > > > > > 104.1 It would be useful to describe when the >> feature >> > > > > > metadata >> > > > > > > is >> > > > > > > > > > > > included >> > > > > > > > > > > > > in the request. My understanding is that it's only >> > > > included >> > > > > > if >> > > > > > > > (1) >> > > > > > > > > > > there >> > > > > > > > > > > > is >> > > > > > > > > > > > > a change to the finalized feature; (2) broker >> > restart; >> > > > (3) >> > > > > > > > > controller >> > > > > > > > > > > > > failover. >> > > > > > > > > > > > > 104.2 The new fields have the following versions. >> Why >> > > are >> > > > > the >> > > > > > > > > > versions >> > > > > > > > > > > 3+ >> > > > > > > > > > > > > when the top version is bumped to 6? >> > > > > > > > > > > > > "fields": [ >> > > > > > > > > > > > > {"name": "Name", "type": "string", >> > "versions": >> > > > > > "3+", >> > > > > > > > > > > > > "about": "The name of the feature."}, >> > > > > > > > > > > > > {"name": "Version", "type": "int64", >> > > > "versions": >> > > > > > > "3+", >> > > > > > > > > > > > > "about": "The finalized version for the >> > > > > feature."} >> > > > > > > > > > > > > ] >> > > > > > > > > > > > >> > > > > > > > > > > > (Kowshik): With the new improved design, we have >> > > completely >> > > > > > > > > eliminated >> > > > > > > > > > > the >> > > > > > > > > > > > need to >> > > > > > > > > > > > use UpdateMetadataRequest. This is because we now >> rely >> > on >> > > > ZK >> > > > > to >> > > > > > > > > deliver >> > > > > > > > > > > the >> > > > > > > > > > > > notifications for changes to the '/features' ZK >> node. >> > > > > > > > > > > > >> > > > > > > > > > > > > 105. kafka-features.sh: Instead of using >> > update/delete, >> > > > > > perhaps >> > > > > > > > > it's >> > > > > > > > > > > > better >> > > > > > > > > > > > > to use enable/disable? >> > > > > > > > > > > > >> > > > > > > > > > > > (Kowshik): For delete, yes, I have changed it so >> that >> > we >> > > > > > instead >> > > > > > > > call >> > > > > > > > > > it >> > > > > > > > > > > > 'disable'. >> > > > > > > > > > > > However for 'update', it can now also refer to >> either >> > an >> > > > > > upgrade >> > > > > > > > or a >> > > > > > > > > > > > forced downgrade. >> > > > > > > > > > > > Therefore, I have left it the way it is, just >> calling >> > it >> > > as >> > > > > > just >> > > > > > > > > > > 'update'. >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > Cheers, >> > > > > > > > > > > > Kowshik >> > > > > > > > > > > > >> > > > > > > > > > > > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao < >> > > j...@confluent.io> >> > > > > > > wrote: >> > > > > > > > > > > > >> > > > > > > > > > > > > Hi, Kowshik, >> > > > > > > > > > > > > >> > > > > > > > > > > > > Thanks for the KIP. Looks good overall. A few >> > comments >> > > > > below. >> > > > > > > > > > > > > >> > > > > > > > > > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse >> > > > > > > > > > > > > 100.1 Since this request waits for responses from >> > > > brokers, >> > > > > > > should >> > > > > > > > > we >> > > > > > > > > > > add >> > > > > > > > > > > > a >> > > > > > > > > > > > > timeout in the request (like createTopicRequest)? >> > > > > > > > > > > > > 100.2 The response schema is a bit weird. >> Typically, >> > > the >> > > > > > > response >> > > > > > > > > > just >> > > > > > > > > > > > > shows an error code and an error message, instead >> of >> > > > > echoing >> > > > > > > the >> > > > > > > > > > > request. >> > > > > > > > > > > > > 100.3 Should we add a separate request to >> > list/describe >> > > > the >> > > > > > > > > existing >> > > > > > > > > > > > > features? >> > > > > > > > > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a >> > > single >> > > > > > > request. >> > > > > > > > > For >> > > > > > > > > > > > > DELETE, the version field doesn't make sense. So, >> I >> > > guess >> > > > > the >> > > > > > > > > broker >> > > > > > > > > > > just >> > > > > > > > > > > > > ignores this? An alternative way is to have a >> > separate >> > > > > > > > > > > > > DeleteFeaturesRequest >> > > > > > > > > > > > > 100.5 In UpdateFeaturesResponse, we have "The >> > > > monotonically >> > > > > > > > > > increasing >> > > > > > > > > > > > > version of the metadata for finalized features." >> I am >> > > > > > wondering >> > > > > > > > why >> > > > > > > > > > the >> > > > > > > > > > > > > ordering is important? >> > > > > > > > > > > > > 100.6 Could you specify the required ACL for this >> new >> > > > > > request? >> > > > > > > > > > > > > >> > > > > > > > > > > > > 101. For the broker registration ZK node, should >> we >> > > bump >> > > > up >> > > > > > the >> > > > > > > > > > version >> > > > > > > > > > > > in >> > > > > > > > > > > > > the json? >> > > > > > > > > > > > > >> > > > > > > > > > > > > 102. For the /features ZK node, not sure if we >> need >> > the >> > > > > epoch >> > > > > > > > > field. >> > > > > > > > > > > Each >> > > > > > > > > > > > > ZK node has an internal version field that is >> > > incremented >> > > > > on >> > > > > > > > every >> > > > > > > > > > > > update. >> > > > > > > > > > > > > >> > > > > > > > > > > > > 103. "Enabling the actual semantics of a feature >> > > version >> > > > > > > > > cluster-wide >> > > > > > > > > > > is >> > > > > > > > > > > > > left to the discretion of the logic implementing >> the >> > > > > feature >> > > > > > > (ex: >> > > > > > > > > can >> > > > > > > > > > > be >> > > > > > > > > > > > > done via dynamic broker config)." Does that mean >> the >> > > > broker >> > > > > > > > > > > registration >> > > > > > > > > > > > ZK >> > > > > > > > > > > > > node will be updated dynamically when this >> happens? >> > > > > > > > > > > > > >> > > > > > > > > > > > > 104. UpdateMetadataRequest >> > > > > > > > > > > > > 104.1 It would be useful to describe when the >> feature >> > > > > > metadata >> > > > > > > is >> > > > > > > > > > > > included >> > > > > > > > > > > > > in the request. My understanding is that it's only >> > > > included >> > > > > > if >> > > > > > > > (1) >> > > > > > > > > > > there >> > > > > > > > > > > > is >> > > > > > > > > > > > > a change to the finalized feature; (2) broker >> > restart; >> > > > (3) >> > > > > > > > > controller >> > > > > > > > > > > > > failover. >> > > > > > > > > > > > > 104.2 The new fields have the following versions. >> Why >> > > are >> > > > > the >> > > > > > > > > > versions >> > > > > > > > > > > 3+ >> > > > > > > > > > > > > when the top version is bumped to 6? >> > > > > > > > > > > > > "fields": [ >> > > > > > > > > > > > > {"name": "Name", "type": "string", >> > "versions": >> > > > > > "3+", >> > > > > > > > > > > > > "about": "The name of the feature."}, >> > > > > > > > > > > > > {"name": "Version", "type": "int64", >> > > > "versions": >> > > > > > > "3+", >> > > > > > > > > > > > > "about": "The finalized version for the >> > > > > feature."} >> > > > > > > > > > > > > ] >> > > > > > > > > > > > > >> > > > > > > > > > > > > 105. kafka-features.sh: Instead of using >> > update/delete, >> > > > > > perhaps >> > > > > > > > > it's >> > > > > > > > > > > > better >> > > > > > > > > > > > > to use enable/disable? >> > > > > > > > > > > > > >> > > > > > > > > > > > > Jun >> > > > > > > > > > > > > >> > > > > > > > > > > > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam < >> > > > > > > > > > > kpraka...@confluent.io >> > > > > > > > > > > > > >> > > > > > > > > > > > > wrote: >> > > > > > > > > > > > > >> > > > > > > > > > > > > > Hey Boyang, >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > Thanks for the great feedback! I have updated >> the >> > KIP >> > > > > based >> > > > > > > on >> > > > > > > > > your >> > > > > > > > > > > > > > feedback. >> > > > > > > > > > > > > > Please find my response below for your comments, >> > look >> > > > for >> > > > > > > > > sentences >> > > > > > > > > > > > > > starting >> > > > > > > > > > > > > > with "(Kowshik)" below. >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 1. "When is it safe for the brokers to begin >> > > handling >> > > > > EOS >> > > > > > > > > > traffic" >> > > > > > > > > > > > > could >> > > > > > > > > > > > > > be >> > > > > > > > > > > > > > > converted as "When is it safe for the brokers >> to >> > > > start >> > > > > > > > serving >> > > > > > > > > > new >> > > > > > > > > > > > > > > Exactly-Once(EOS) semantics" since EOS is not >> > > > explained >> > > > > > > > earlier >> > > > > > > > > > in >> > > > > > > > > > > > the >> > > > > > > > > > > > > > > context. >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > (Kowshik): Great point! Done. >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 2. In the *Explanation *section, the metadata >> > > version >> > > > > > > number >> > > > > > > > > part >> > > > > > > > > > > > > seems a >> > > > > > > > > > > > > > > bit blurred. Could you point a reference to >> later >> > > > > section >> > > > > > > > that >> > > > > > > > > we >> > > > > > > > > > > > going >> > > > > > > > > > > > > > to >> > > > > > > > > > > > > > > store it in Zookeeper and update it every time >> > when >> > > > > there >> > > > > > > is >> > > > > > > > a >> > > > > > > > > > > > feature >> > > > > > > > > > > > > > > change? >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > (Kowshik): Great point! Done. I've added a >> > reference >> > > in >> > > > > the >> > > > > > > > KIP. >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 3. For the feature downgrade, although it's a >> > > > Non-goal >> > > > > of >> > > > > > > the >> > > > > > > > > > KIP, >> > > > > > > > > > > > for >> > > > > > > > > > > > > > > features such as group coordinator semantics, >> > there >> > > > is >> > > > > no >> > > > > > > > legal >> > > > > > > > > > > > > scenario >> > > > > > > > > > > > > > to >> > > > > > > > > > > > > > > perform a downgrade at all. So having >> downgrade >> > > door >> > > > > open >> > > > > > > is >> > > > > > > > > > pretty >> > > > > > > > > > > > > > > error-prone as human faults happen all the >> time. >> > > I'm >> > > > > > > assuming >> > > > > > > > > as >> > > > > > > > > > > new >> > > > > > > > > > > > > > > features are implemented, it's not very hard >> to >> > > add a >> > > > > > flag >> > > > > > > > > during >> > > > > > > > > > > > > feature >> > > > > > > > > > > > > > > creation to indicate whether this feature is >> > > > > > > "downgradable". >> > > > > > > > > > Could >> > > > > > > > > > > > you >> > > > > > > > > > > > > > > explain a bit more on the extra engineering >> > effort >> > > > for >> > > > > > > > shipping >> > > > > > > > > > > this >> > > > > > > > > > > > > KIP >> > > > > > > > > > > > > > > with downgrade protection in place? >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > (Kowshik): Great point! I'd agree and disagree >> > here. >> > > > > While >> > > > > > I >> > > > > > > > > agree >> > > > > > > > > > > that >> > > > > > > > > > > > > > accidental >> > > > > > > > > > > > > > downgrades can cause problems, I also think >> > sometimes >> > > > > > > > downgrades >> > > > > > > > > > > should >> > > > > > > > > > > > > > be allowed for emergency reasons (not all >> > downgrades >> > > > > cause >> > > > > > > > > issues). >> > > > > > > > > > > > > > It is just subjective to the feature being >> > > downgraded. >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > To be more strict about feature version >> > downgrades, I >> > > > > have >> > > > > > > > > modified >> > > > > > > > > > > the >> > > > > > > > > > > > > KIP >> > > > > > > > > > > > > > proposing that we mandate a `--force-downgrade` >> > flag >> > > be >> > > > > > used >> > > > > > > in >> > > > > > > > > the >> > > > > > > > > > > > > > UPDATE_FEATURES api >> > > > > > > > > > > > > > and the tooling, whenever the human is >> downgrading >> > a >> > > > > > > finalized >> > > > > > > > > > > feature >> > > > > > > > > > > > > > version. >> > > > > > > > > > > > > > Hopefully this should cover the requirement, >> until >> > we >> > > > > find >> > > > > > > the >> > > > > > > > > need >> > > > > > > > > > > for >> > > > > > > > > > > > > > advanced downgrade support. >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 4. "Each broker’s supported dictionary of >> feature >> > > > > > versions >> > > > > > > > will >> > > > > > > > > > be >> > > > > > > > > > > > > > defined >> > > > > > > > > > > > > > > in the broker code." So this means in order to >> > > > > restrict a >> > > > > > > > > certain >> > > > > > > > > > > > > > feature, >> > > > > > > > > > > > > > > we need to start the broker first and then >> send a >> > > > > feature >> > > > > > > > > gating >> > > > > > > > > > > > > request >> > > > > > > > > > > > > > > immediately, which introduces a time gap and >> the >> > > > > > > > > > intended-to-close >> > > > > > > > > > > > > > feature >> > > > > > > > > > > > > > > could actually serve request during this >> phase. >> > Do >> > > > you >> > > > > > > think >> > > > > > > > we >> > > > > > > > > > > > should >> > > > > > > > > > > > > > also >> > > > > > > > > > > > > > > support configurations as well so that admin >> user >> > > > could >> > > > > > > > freely >> > > > > > > > > > roll >> > > > > > > > > > > > up >> > > > > > > > > > > > > a >> > > > > > > > > > > > > > > cluster with all nodes complying the same >> feature >> > > > > gating, >> > > > > > > > > without >> > > > > > > > > > > > > > worrying >> > > > > > > > > > > > > > > about the turnaround time to propagate the >> > message >> > > > only >> > > > > > > after >> > > > > > > > > the >> > > > > > > > > > > > > cluster >> > > > > > > > > > > > > > > starts up? >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > (Kowshik): This is a great point/question. One >> of >> > the >> > > > > > > > > expectations >> > > > > > > > > > > out >> > > > > > > > > > > > of >> > > > > > > > > > > > > > this KIP, which is >> > > > > > > > > > > > > > already followed in the broker, is the >> following. >> > > > > > > > > > > > > > - Imagine at time T1 the broker starts up and >> > > > registers >> > > > > > it’s >> > > > > > > > > > > presence >> > > > > > > > > > > > in >> > > > > > > > > > > > > > ZK, >> > > > > > > > > > > > > > along with advertising it’s supported >> features. >> > > > > > > > > > > > > > - Imagine at a future time T2 the broker >> receives >> > > the >> > > > > > > > > > > > > > UpdateMetadataRequest >> > > > > > > > > > > > > > from the controller, which contains the >> latest >> > > > > finalized >> > > > > > > > > > features >> > > > > > > > > > > as >> > > > > > > > > > > > > > seen by >> > > > > > > > > > > > > > the controller. The broker validates this >> data >> > > > against >> > > > > > > it’s >> > > > > > > > > > > > supported >> > > > > > > > > > > > > > features to >> > > > > > > > > > > > > > make sure there is no mismatch (it will >> shutdown >> > > if >> > > > > > there >> > > > > > > is >> > > > > > > > > an >> > > > > > > > > > > > > > incompatibility). >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > It is expected that during the time between the >> 2 >> > > > events >> > > > > T1 >> > > > > > > and >> > > > > > > > > T2, >> > > > > > > > > > > the >> > > > > > > > > > > > > > broker is >> > > > > > > > > > > > > > almost a silent entity in the cluster. It does >> not >> > > add >> > > > > any >> > > > > > > > value >> > > > > > > > > to >> > > > > > > > > > > the >> > > > > > > > > > > > > > cluster, or carry >> > > > > > > > > > > > > > out any important broker activities. By >> > “important”, >> > > I >> > > > > mean >> > > > > > > it >> > > > > > > > is >> > > > > > > > > > not >> > > > > > > > > > > > > doing >> > > > > > > > > > > > > > mutations >> > > > > > > > > > > > > > on it’s persistence, not mutating critical >> > in-memory >> > > > > state, >> > > > > > > > won’t >> > > > > > > > > > be >> > > > > > > > > > > > > > serving >> > > > > > > > > > > > > > produce/fetch requests. Note it doesn’t even >> know >> > > it’s >> > > > > > > assigned >> > > > > > > > > > > > > partitions >> > > > > > > > > > > > > > until >> > > > > > > > > > > > > > it receives UpdateMetadataRequest from >> controller. >> > > > > Anything >> > > > > > > the >> > > > > > > > > > > broker >> > > > > > > > > > > > is >> > > > > > > > > > > > > > doing up >> > > > > > > > > > > > > > until this point is not damaging/useful. >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > I’ve clarified the above in the KIP, see this >> new >> > > > > section: >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime >> > > > > > > > > > > > > > . >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 5. "adding a new Feature, updating or >> deleting an >> > > > > > existing >> > > > > > > > > > > Feature", >> > > > > > > > > > > > > may >> > > > > > > > > > > > > > be >> > > > > > > > > > > > > > > I misunderstood something, I thought the >> features >> > > are >> > > > > > > defined >> > > > > > > > > in >> > > > > > > > > > > > broker >> > > > > > > > > > > > > > > code, so admin could not really create a new >> > > feature? >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > (Kowshik): Great point! You understood this >> right. >> > > Here >> > > > > > > adding >> > > > > > > > a >> > > > > > > > > > > > feature >> > > > > > > > > > > > > > means we are >> > > > > > > > > > > > > > adding a cluster-wide finalized *max* version >> for a >> > > > > feature >> > > > > > > > that >> > > > > > > > > > was >> > > > > > > > > > > > > > previously never finalized. >> > > > > > > > > > > > > > I have clarified this in the KIP now. >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 6. I think we need a separate error code like >> > > > > > > > > > > > > FEATURE_UPDATE_IN_PROGRESS >> > > > > > > > > > > > > > to >> > > > > > > > > > > > > > > reject a concurrent feature update request. >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > (Kowshik): Great point! I have modified the KIP >> > > adding >> > > > > the >> > > > > > > > above >> > > > > > > > > > (see >> > > > > > > > > > > > > > 'Tooling support -> Admin API changes'). >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 7. I think we haven't discussed the >> alternative >> > > > > solution >> > > > > > to >> > > > > > > > > pass >> > > > > > > > > > > the >> > > > > > > > > > > > > > > feature information through Zookeeper. Is that >> > > > > mentioned >> > > > > > in >> > > > > > > > the >> > > > > > > > > > KIP >> > > > > > > > > > > > to >> > > > > > > > > > > > > > > justify why using UpdateMetadata is more >> > favorable? >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > (Kowshik): Nice question! The broker reads >> > finalized >> > > > > > feature >> > > > > > > > info >> > > > > > > > > > > > stored >> > > > > > > > > > > > > in >> > > > > > > > > > > > > > ZK, >> > > > > > > > > > > > > > only during startup when it does a validation. >> When >> > > > > serving >> > > > > > > > > > > > > > `ApiVersionsRequest`, the >> > > > > > > > > > > > > > broker does not read this info from ZK directly. >> > I'd >> > > > > > imagine >> > > > > > > > the >> > > > > > > > > > risk >> > > > > > > > > > > > is >> > > > > > > > > > > > > > that it can increase >> > > > > > > > > > > > > > the ZK read QPS which can be a bottleneck for >> the >> > > > system. >> > > > > > > > Today, >> > > > > > > > > in >> > > > > > > > > > > > Kafka >> > > > > > > > > > > > > > we use the >> > > > > > > > > > > > > > controller to fan out ZK updates to brokers and >> we >> > > want >> > > > > to >> > > > > > > > stick >> > > > > > > > > to >> > > > > > > > > > > > that >> > > > > > > > > > > > > > pattern to avoid >> > > > > > > > > > > > > > the ZK read bottleneck when serving >> > > > `ApiVersionsRequest`. >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 8. I was under the impression that user could >> > > > > configure a >> > > > > > > > range >> > > > > > > > > > of >> > > > > > > > > > > > > > > supported versions, what's the trade-off for >> > > allowing >> > > > > > > single >> > > > > > > > > > > > finalized >> > > > > > > > > > > > > > > version only? >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > (Kowshik): Great question! The finalized version >> > of a >> > > > > > feature >> > > > > > > > > > > basically >> > > > > > > > > > > > > > refers to >> > > > > > > > > > > > > > the cluster-wide finalized feature "maximum" >> > version. >> > > > For >> > > > > > > > > example, >> > > > > > > > > > if >> > > > > > > > > > > > the >> > > > > > > > > > > > > > 'group_coordinator' feature >> > > > > > > > > > > > > > has the finalized version set to 10, then, it >> means >> > > > that >> > > > > > > > > > cluster-wide >> > > > > > > > > > > > all >> > > > > > > > > > > > > > versions upto v10 are >> > > > > > > > > > > > > > supported for this feature. However, note that >> if >> > > some >> > > > > > > version >> > > > > > > > > (ex: >> > > > > > > > > > > v0) >> > > > > > > > > > > > > > gets deprecated >> > > > > > > > > > > > > > for this feature, then we don’t convey that >> using >> > > this >> > > > > > scheme >> > > > > > > > > (also >> > > > > > > > > > > > > > supporting deprecation is a non-goal). >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > (Kowshik): I’ve now modified the KIP at all >> points, >> > > > > > refering >> > > > > > > to >> > > > > > > > > > > > finalized >> > > > > > > > > > > > > > feature "maximum" versions. >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 9. One minor syntax fix: Note that here the >> > > "client" >> > > > > here >> > > > > > > may >> > > > > > > > > be >> > > > > > > > > > a >> > > > > > > > > > > > > > producer >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > (Kowshik): Great point! Done. >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > Cheers, >> > > > > > > > > > > > > > Kowshik >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen < >> > > > > > > > > > > > reluctanthero...@gmail.com> >> > > > > > > > > > > > > > wrote: >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Hey Kowshik, >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > thanks for the revised KIP. Got a couple of >> > > > questions: >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 1. "When is it safe for the brokers to begin >> > > handling >> > > > > EOS >> > > > > > > > > > traffic" >> > > > > > > > > > > > > could >> > > > > > > > > > > > > > be >> > > > > > > > > > > > > > > converted as "When is it safe for the brokers >> to >> > > > start >> > > > > > > > serving >> > > > > > > > > > new >> > > > > > > > > > > > > > > Exactly-Once(EOS) semantics" since EOS is not >> > > > explained >> > > > > > > > earlier >> > > > > > > > > > in >> > > > > > > > > > > > the >> > > > > > > > > > > > > > > context. >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 2. In the *Explanation *section, the metadata >> > > version >> > > > > > > number >> > > > > > > > > part >> > > > > > > > > > > > > seems a >> > > > > > > > > > > > > > > bit blurred. Could you point a reference to >> later >> > > > > section >> > > > > > > > that >> > > > > > > > > we >> > > > > > > > > > > > going >> > > > > > > > > > > > > > to >> > > > > > > > > > > > > > > store it in Zookeeper and update it every time >> > when >> > > > > there >> > > > > > > is >> > > > > > > > a >> > > > > > > > > > > > feature >> > > > > > > > > > > > > > > change? >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 3. For the feature downgrade, although it's a >> > > > Non-goal >> > > > > of >> > > > > > > the >> > > > > > > > > > KIP, >> > > > > > > > > > > > for >> > > > > > > > > > > > > > > features such as group coordinator semantics, >> > there >> > > > is >> > > > > no >> > > > > > > > legal >> > > > > > > > > > > > > scenario >> > > > > > > > > > > > > > to >> > > > > > > > > > > > > > > perform a downgrade at all. So having >> downgrade >> > > door >> > > > > open >> > > > > > > is >> > > > > > > > > > pretty >> > > > > > > > > > > > > > > error-prone as human faults happen all the >> time. >> > > I'm >> > > > > > > assuming >> > > > > > > > > as >> > > > > > > > > > > new >> > > > > > > > > > > > > > > features are implemented, it's not very hard >> to >> > > add a >> > > > > > flag >> > > > > > > > > during >> > > > > > > > > > > > > feature >> > > > > > > > > > > > > > > creation to indicate whether this feature is >> > > > > > > "downgradable". >> > > > > > > > > > Could >> > > > > > > > > > > > you >> > > > > > > > > > > > > > > explain a bit more on the extra engineering >> > effort >> > > > for >> > > > > > > > shipping >> > > > > > > > > > > this >> > > > > > > > > > > > > KIP >> > > > > > > > > > > > > > > with downgrade protection in place? >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 4. "Each broker’s supported dictionary of >> feature >> > > > > > versions >> > > > > > > > will >> > > > > > > > > > be >> > > > > > > > > > > > > > defined >> > > > > > > > > > > > > > > in the broker code." So this means in order to >> > > > > restrict a >> > > > > > > > > certain >> > > > > > > > > > > > > > feature, >> > > > > > > > > > > > > > > we need to start the broker first and then >> send a >> > > > > feature >> > > > > > > > > gating >> > > > > > > > > > > > > request >> > > > > > > > > > > > > > > immediately, which introduces a time gap and >> the >> > > > > > > > > > intended-to-close >> > > > > > > > > > > > > > feature >> > > > > > > > > > > > > > > could actually serve request during this >> phase. >> > Do >> > > > you >> > > > > > > think >> > > > > > > > we >> > > > > > > > > > > > should >> > > > > > > > > > > > > > also >> > > > > > > > > > > > > > > support configurations as well so that admin >> user >> > > > could >> > > > > > > > freely >> > > > > > > > > > roll >> > > > > > > > > > > > up >> > > > > > > > > > > > > a >> > > > > > > > > > > > > > > cluster with all nodes complying the same >> feature >> > > > > gating, >> > > > > > > > > without >> > > > > > > > > > > > > > worrying >> > > > > > > > > > > > > > > about the turnaround time to propagate the >> > message >> > > > only >> > > > > > > after >> > > > > > > > > the >> > > > > > > > > > > > > cluster >> > > > > > > > > > > > > > > starts up? >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 5. "adding a new Feature, updating or >> deleting an >> > > > > > existing >> > > > > > > > > > > Feature", >> > > > > > > > > > > > > may >> > > > > > > > > > > > > > be >> > > > > > > > > > > > > > > I misunderstood something, I thought the >> features >> > > are >> > > > > > > defined >> > > > > > > > > in >> > > > > > > > > > > > broker >> > > > > > > > > > > > > > > code, so admin could not really create a new >> > > feature? >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 6. I think we need a separate error code like >> > > > > > > > > > > > > FEATURE_UPDATE_IN_PROGRESS >> > > > > > > > > > > > > > to >> > > > > > > > > > > > > > > reject a concurrent feature update request. >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 7. I think we haven't discussed the >> alternative >> > > > > solution >> > > > > > to >> > > > > > > > > pass >> > > > > > > > > > > the >> > > > > > > > > > > > > > > feature information through Zookeeper. Is that >> > > > > mentioned >> > > > > > in >> > > > > > > > the >> > > > > > > > > > KIP >> > > > > > > > > > > > to >> > > > > > > > > > > > > > > justify why using UpdateMetadata is more >> > favorable? >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 8. I was under the impression that user could >> > > > > configure a >> > > > > > > > range >> > > > > > > > > > of >> > > > > > > > > > > > > > > supported versions, what's the trade-off for >> > > allowing >> > > > > > > single >> > > > > > > > > > > > finalized >> > > > > > > > > > > > > > > version only? >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 9. One minor syntax fix: Note that here the >> > > "client" >> > > > > here >> > > > > > > may >> > > > > > > > > be >> > > > > > > > > > a >> > > > > > > > > > > > > > producer >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Boyang >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe < >> > > > > > > > > cmcc...@apache.org >> > > > > > > > > > > >> > > > > > > > > > > > > wrote: >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik >> > Prakasam >> > > > > wrote: >> > > > > > > > > > > > > > > > > Hi Colin, >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > Thanks for the feedback! I've changed the >> KIP >> > > to >> > > > > > > address >> > > > > > > > > your >> > > > > > > > > > > > > > > > > suggestions. >> > > > > > > > > > > > > > > > > Please find below my explanation. Here is >> a >> > > link >> > > > to >> > > > > > KIP >> > > > > > > > > 584: >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features >> > > > > > > > > > > > > > > > > . >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > 1. '__data_version__' is the version of >> the >> > > > > finalized >> > > > > > > > > feature >> > > > > > > > > > > > > > metadata >> > > > > > > > > > > > > > > > > (i.e. actual ZK node contents), while the >> > > > > > > > > > '__schema_version__' >> > > > > > > > > > > is >> > > > > > > > > > > > > the >> > > > > > > > > > > > > > > > > version of the schema of the data >> persisted >> > in >> > > > ZK. >> > > > > > > These >> > > > > > > > > > serve >> > > > > > > > > > > > > > > different >> > > > > > > > > > > > > > > > > purposes. '__data_version__' is is useful >> > > mainly >> > > > to >> > > > > > > > clients >> > > > > > > > > > > > during >> > > > > > > > > > > > > > > reads, >> > > > > > > > > > > > > > > > > to differentiate between the 2 versions of >> > > > > eventually >> > > > > > > > > > > consistent >> > > > > > > > > > > > > > > > 'finalized >> > > > > > > > > > > > > > > > > features' metadata (i.e. larger metadata >> > > version >> > > > is >> > > > > > > more >> > > > > > > > > > > recent). >> > > > > > > > > > > > > > > > > '__schema_version__' provides an >> additional >> > > > degree >> > > > > of >> > > > > > > > > > > > flexibility, >> > > > > > > > > > > > > > > where >> > > > > > > > > > > > > > > > if >> > > > > > > > > > > > > > > > > we decide to change the schema for >> > '/features' >> > > > node >> > > > > > in >> > > > > > > ZK >> > > > > > > > > (in >> > > > > > > > > > > the >> > > > > > > > > > > > > > > > future), >> > > > > > > > > > > > > > > > > then we can manage broker roll outs >> suitably >> > > > (i.e. >> > > > > > > > > > > > > > > > > serialization/deserialization of the ZK >> data >> > > can >> > > > be >> > > > > > > > handled >> > > > > > > > > > > > > safely). >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > Hi Kowshik, >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > If you're talking about a number that lets >> you >> > > know >> > > > > if >> > > > > > > data >> > > > > > > > > is >> > > > > > > > > > > more >> > > > > > > > > > > > > or >> > > > > > > > > > > > > > > > less recent, we would typically call that an >> > > epoch, >> > > > > and >> > > > > > > > not a >> > > > > > > > > > > > > version. >> > > > > > > > > > > > > > > For >> > > > > > > > > > > > > > > > the ZK data structures, the word "version" >> is >> > > > > typically >> > > > > > > > > > reserved >> > > > > > > > > > > > for >> > > > > > > > > > > > > > > > describing changes to the overall schema of >> the >> > > > data >> > > > > > that >> > > > > > > > is >> > > > > > > > > > > > written >> > > > > > > > > > > > > to >> > > > > > > > > > > > > > > > ZooKeeper. We don't even really change the >> > > > "version" >> > > > > > of >> > > > > > > > > those >> > > > > > > > > > > > > schemas >> > > > > > > > > > > > > > > that >> > > > > > > > > > > > > > > > much, since most changes are >> > > backwards-compatible. >> > > > > But >> > > > > > > we >> > > > > > > > do >> > > > > > > > > > > > include >> > > > > > > > > > > > > > > that >> > > > > > > > > > > > > > > > version field just in case. >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > I don't think we really need an epoch here, >> > > though, >> > > > > > since >> > > > > > > > we >> > > > > > > > > > can >> > > > > > > > > > > > just >> > > > > > > > > > > > > > > look >> > > > > > > > > > > > > > > > at the broker epoch. Whenever the broker >> > > > registers, >> > > > > > its >> > > > > > > > > epoch >> > > > > > > > > > > will >> > > > > > > > > > > > > be >> > > > > > > > > > > > > > > > greater than the previous broker epoch. And >> > the >> > > > > newly >> > > > > > > > > > registered >> > > > > > > > > > > > > data >> > > > > > > > > > > > > > > will >> > > > > > > > > > > > > > > > take priority. This will be a lot simpler >> than >> > > > > adding >> > > > > > a >> > > > > > > > > > separate >> > > > > > > > > > > > > epoch >> > > > > > > > > > > > > > > > system, I think. >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > 2. Regarding admin client needing min and >> max >> > > > > > > > information - >> > > > > > > > > > you >> > > > > > > > > > > > are >> > > > > > > > > > > > > > > > right! >> > > > > > > > > > > > > > > > > I've changed the KIP such that the Admin >> API >> > > also >> > > > > > > allows >> > > > > > > > > the >> > > > > > > > > > > user >> > > > > > > > > > > > > to >> > > > > > > > > > > > > > > read >> > > > > > > > > > > > > > > > > 'supported features' from a specific >> broker. >> > > > Please >> > > > > > > look >> > > > > > > > at >> > > > > > > > > > the >> > > > > > > > > > > > > > section >> > > > > > > > > > > > > > > > > "Admin API changes". >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > Thanks. >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > 3. Regarding the use of `long` vs `Long` >> - it >> > > was >> > > > > not >> > > > > > > > > > > deliberate. >> > > > > > > > > > > > > > I've >> > > > > > > > > > > > > > > > > improved the KIP to just use `long` at all >> > > > places. >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > Sounds good. >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > 4. Regarding kafka.admin.FeatureCommand >> tool >> > - >> > > > you >> > > > > > are >> > > > > > > > > right! >> > > > > > > > > > > > I've >> > > > > > > > > > > > > > > > updated >> > > > > > > > > > > > > > > > > the KIP sketching the functionality >> provided >> > by >> > > > > this >> > > > > > > > tool, >> > > > > > > > > > with >> > > > > > > > > > > > > some >> > > > > > > > > > > > > > > > > examples. Please look at the section >> "Tooling >> > > > > support >> > > > > > > > > > > examples". >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > Thank you! >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > Thanks, Kowshik. >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > cheers, >> > > > > > > > > > > > > > > > Colin >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > Cheers, >> > > > > > > > > > > > > > > > > Kowshik >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin >> > McCabe < >> > > > > > > > > > > > cmcc...@apache.org> >> > > > > > > > > > > > > > > > wrote: >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > Thanks, Kowshik, this looks good. >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > In the "Schema" section, do we really >> need >> > > both >> > > > > > > > > > > > > __schema_version__ >> > > > > > > > > > > > > > > and >> > > > > > > > > > > > > > > > > > __data_version__? Can we just have a >> > single >> > > > > > version >> > > > > > > > > field >> > > > > > > > > > > > here? >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > Shouldn't the Admin(Client) function >> have >> > > some >> > > > > way >> > > > > > to >> > > > > > > > get >> > > > > > > > > > the >> > > > > > > > > > > > min >> > > > > > > > > > > > > > and >> > > > > > > > > > > > > > > > max >> > > > > > > > > > > > > > > > > > information that we're exposing as >> well? I >> > > > guess >> > > > > > we >> > > > > > > > > could >> > > > > > > > > > > have >> > > > > > > > > > > > > > min, >> > > > > > > > > > > > > > > > max, >> > > > > > > > > > > > > > > > > > and current. Unrelated: is the use of >> Long >> > > > > rather >> > > > > > > than >> > > > > > > > > > long >> > > > > > > > > > > > > > > deliberate >> > > > > > > > > > > > > > > > > > here? >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > It would be good to describe how the >> > command >> > > > line >> > > > > > > tool >> > > > > > > > > > > > > > > > > > kafka.admin.FeatureCommand will work. >> For >> > > > > example >> > > > > > > the >> > > > > > > > > > flags >> > > > > > > > > > > > that >> > > > > > > > > > > > > > it >> > > > > > > > > > > > > > > > will >> > > > > > > > > > > > > > > > > > take and the output that it will >> generate >> > to >> > > > > > STDOUT. >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > cheers, >> > > > > > > > > > > > > > > > > > Colin >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik >> > > > Prakasam >> > > > > > > wrote: >> > > > > > > > > > > > > > > > > > > Hi all, >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > I've opened KIP-584 >> > > > > > > > > > > > > <https://issues.apache.org/jira/browse/KIP-584> < >> > > > > > > > > > > > > > https://issues.apache.org/jira/browse/KIP-584 >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > which >> > > > > > > > > > > > > > > > > > > is intended to provide a versioning >> > scheme >> > > > for >> > > > > > > > > features. >> > > > > > > > > > > I'd >> > > > > > > > > > > > > like >> > > > > > > > > > > > > > > to >> > > > > > > > > > > > > > > > use >> > > > > > > > > > > > > > > > > > > this thread to discuss the same. I'd >> > > > appreciate >> > > > > > any >> > > > > > > > > > > feedback >> > > > > > > > > > > > on >> > > > > > > > > > > > > > > this. >> > > > > > > > > > > > > > > > > > > Here >> > > > > > > > > > > > > > > > > > > is a link to KIP-584 >> > > > > > > > > > > > > <https://issues.apache.org/jira/browse/KIP-584>: >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features >> > > > > > > > > > > > > > > > > > > . >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > Thank you! >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > Cheers, >> > > > > > > > > > > > > > > > > > > Kowshik >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> >