Hi, Kowshik, Looks good to me now. Just a couple of minor things below.
200. In the validation section, there is still the text "*from* {"max_version_level": X} *to* {"max_version_level": X’}". It seems that it should say "from X to Y"? 110. Could we add that we need to document the bumped version of each feature in the upgrade section of a release? Thanks, Jun On Wed, Apr 15, 2020 at 1:08 PM Kowshik Prakasam <kpraka...@confluent.io> wrote: > Hi Jun, > > Thank you for the suggestion! I have updated the KIP, please find my > response below. > > > 200. I guess you are saying only when the allowDowngrade field is set, > the > > finalized feature version can go backward. Otherwise, it can only go up. > > That makes sense. It would be useful to make that clear when explaining > > the usage of the allowDowngrade field. In the validation section, we > have " > > /features' from {"max_version_level": X} to {"max_version_level": X’}", > it > > seems that we need to mention Y there. > > (Kowshik): Great point! Yes, that is correct. Done, I have updated the > validations > section explaining the above. Here is a link to this section: > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Validations > > > Cheers, > Kowshik > > > > > On Wed, Apr 15, 2020 at 11:05 AM Jun Rao <j...@confluent.io> wrote: > > > Hi, Kowshik, > > > > 200. I guess you are saying only when the allowDowngrade field is set, > the > > finalized feature version can go backward. Otherwise, it can only go up. > > That makes sense. It would be useful to make that clear when explaining > > the usage of the allowDowngrade field. In the validation section, we have > > " > > /features' from {"max_version_level": X} to {"max_version_level": X’}", > it > > seems that we need to mention Y there. > > > > Thanks, > > > > Jun > > > > On Wed, Apr 15, 2020 at 10:44 AM Kowshik Prakasam < > kpraka...@confluent.io> > > wrote: > > > > > Hi Jun, > > > > > > Great question! Please find my response below. > > > > > > > 200. My understanding is that If the CLI tool passes the > > > > '--allow-downgrade' flag when updating a specific feature, then a > > future > > > > downgrade is possible. Otherwise, the feature is now downgradable. If > > so, > > > I > > > > was wondering how the controller remembers this since it can be > > restarted > > > > over time? > > > > > > (Kowshik): The purpose of the flag was to just restrict the user intent > > for > > > a specific request. > > > It seems to me that to avoid confusion, I could call the flag as > > > `--try-downgrade` instead. > > > Then this makes it clear, that, the controller just has to consider the > > ask > > > from > > > the user as an explicit request to attempt a downgrade. > > > > > > The flag does not act as an override on controller's decision making > that > > > decides whether > > > a flag is downgradable (these decisions on whether to allow a flag to > be > > > downgraded > > > from a specific version level, can be embedded in the controller code). > > > > > > Please let me know what you think. > > > Sorry if I misunderstood the original question. > > > > > > > > > Cheers, > > > Kowshik > > > > > > > > > On Wed, Apr 15, 2020 at 9:40 AM Jun Rao <j...@confluent.io> wrote: > > > > > > > Hi, Kowshik, > > > > > > > > Thanks for the reply. Makes sense. Just one more question. > > > > > > > > 200. My understanding is that If the CLI tool passes the > > > > '--allow-downgrade' flag when updating a specific feature, then a > > future > > > > downgrade is possible. Otherwise, the feature is now downgradable. If > > > so, I > > > > was wondering how the controller remembers this since it can be > > restarted > > > > over time? > > > > > > > > Jun > > > > > > > > > > > > On Tue, Apr 14, 2020 at 6:49 PM Kowshik Prakasam < > > kpraka...@confluent.io > > > > > > > > wrote: > > > > > > > > > Hi Jun, > > > > > > > > > > Thanks a lot for the feedback and the questions! > > > > > Please find my response below. > > > > > > > > > > > 200. The UpdateFeaturesRequest includes an AllowDowngrade field. > It > > > > seems > > > > > > that field needs to be persisted somewhere in ZK? > > > > > > > > > > (Kowshik): Great question! Below is my explanation. Please help me > > > > > understand, > > > > > if you feel there are cases where we would need to still persist it > > in > > > > ZK. > > > > > > > > > > Firstly I have updated my thoughts into the KIP now, under the > > > > 'guidelines' > > > > > section: > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guidelinesonfeatureversionsandworkflows > > > > > > > > > > The allowDowngrade boolean field is just to restrict the user > intent, > > > and > > > > > to remind > > > > > them to double check their intent before proceeding. It should be > set > > > to > > > > > true > > > > > by the user in a request, only when the user intent is to > forcefully > > > > > "attempt" a > > > > > downgrade of a specific feature's max version level, to the > provided > > > > value > > > > > in > > > > > the request. > > > > > > > > > > We can extend this safeguard. The controller (on it's end) can > > maintain > > > > > rules in the code, that, for safety reasons would outright reject > > > certain > > > > > downgrades > > > > > from a specific max_version_level for a specific feature. Such > > > rejections > > > > > may > > > > > happen depending on the feature being downgraded, and from what > > version > > > > > level. > > > > > > > > > > The CLI tool only allows a downgrade attempt in conjunction with > > > specific > > > > > flags and sub-commands. For example, in the CLI tool, if the user > > uses > > > > the > > > > > 'downgrade-all' command, or passes '--allow-downgrade' flag when > > > > updating a > > > > > specific feature, only then the tool will translate this ask to > > setting > > > > > 'allowDowngrade' field in the request to the server. > > > > > > > > > > > 201. UpdateFeaturesResponse has the following top level fields. > > > Should > > > > > > those fields be per feature? > > > > > > > > > > > > "fields": [ > > > > > > { "name": "ErrorCode", "type": "int16", "versions": "0+", > > > > > > "about": "The error code, or 0 if there was no error." }, > > > > > > { "name": "ErrorMessage", "type": "string", "versions": "0+", > > > > > > "about": "The error message, or null if there was no > error." > > } > > > > > > ] > > > > > > > > > > (Kowshik): Great question! > > > > > As such, the API is transactional, as explained in the sections > > linked > > > > > below. > > > > > Either all provided FeatureUpdate was applied, or none. > > > > > It's the reason I felt we can have just one error code + message. > > > > > Happy to extend this if you feel otherwise. Please let me know. > > > > > > > > > > Link to sections: > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-ChangestoKafkaController > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guarantees > > > > > > > > > > > 202. The /features path in ZK has a field min_version_level. > Which > > > API > > > > > and > > > > > > tool can change that value? > > > > > > > > > > (Kowshik): Great question! Currently this cannot be modified by > using > > > the > > > > > API or the tool. > > > > > Feature version deprecation (by raising min_version_level) can be > > done > > > > only > > > > > by the Controller directly. The rationale is explained in this > > section: > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Featureversiondeprecation > > > > > > > > > > > > > > > Cheers, > > > > > Kowshik > > > > > > > > > > On Tue, Apr 14, 2020 at 5:33 PM Jun Rao <j...@confluent.io> wrote: > > > > > > > > > > > Hi, Kowshik, > > > > > > > > > > > > Thanks for addressing those comments. Just a few more minor > > comments. > > > > > > > > > > > > 200. The UpdateFeaturesRequest includes an AllowDowngrade field. > It > > > > seems > > > > > > that field needs to be persisted somewhere in ZK? > > > > > > > > > > > > 201. UpdateFeaturesResponse has the following top level fields. > > > Should > > > > > > those fields be per feature? > > > > > > > > > > > > "fields": [ > > > > > > { "name": "ErrorCode", "type": "int16", "versions": "0+", > > > > > > "about": "The error code, or 0 if there was no error." }, > > > > > > { "name": "ErrorMessage", "type": "string", "versions": "0+", > > > > > > "about": "The error message, or null if there was no > error." > > } > > > > > > ] > > > > > > > > > > > > 202. The /features path in ZK has a field min_version_level. > Which > > > API > > > > > and > > > > > > tool can change that value? > > > > > > > > > > > > Jun > > > > > > > > > > > > On Mon, Apr 13, 2020 at 5:12 PM Kowshik Prakasam < > > > > kpraka...@confluent.io > > > > > > > > > > > > wrote: > > > > > > > > > > > > > Hi Jun, > > > > > > > > > > > > > > Thanks for the feedback! I have updated the KIP-584 addressing > > your > > > > > > > comments. > > > > > > > Please find my response below. > > > > > > > > > > > > > > > 100.6 You can look for the sentence "This operation requires > > > ALTER > > > > on > > > > > > > > CLUSTER." in KIP-455. Also, you can check its usage in > > > > > > > > KafkaApis.authorize(). > > > > > > > > > > > > > > (Kowshik): Done. Great point! For the newly introduced > > > > UPDATE_FEATURES > > > > > > api, > > > > > > > I have added a > > > > > > > requirement that AclOperation.ALTER is required on > > > > > ResourceType.CLUSTER. > > > > > > > > > > > > > > > 110. Keeping the feature version as int is probably fine. I > > just > > > > felt > > > > > > > that > > > > > > > > for some of the common user interactions, it's more > convenient > > to > > > > > > > > relate that to a release version. For example, if a user > wants > > to > > > > > > > downgrade > > > > > > > > to a release 2.5, it's easier for the user to use the tool > like > > > > "tool > > > > > > > > --downgrade 2.5" instead of "tool --downgrade --feature X > > > --version > > > > > 6". > > > > > > > > > > > > > > (Kowshik): Great point. Generally, maximum feature version > levels > > > are > > > > > not > > > > > > > downgradable after > > > > > > > they are finalized in the cluster. This is because, as a > > guideline > > > > > > bumping > > > > > > > feature version level usually is used mainly to convey > important > > > > > breaking > > > > > > > changes. > > > > > > > Despite the above, there may be some extreme/rare cases where a > > > user > > > > > > wants > > > > > > > to downgrade > > > > > > > all features to a specific previous release. The user may want > to > > > do > > > > > this > > > > > > > just > > > > > > > prior to rolling back a Kafka cluster to a previous release. > > > > > > > > > > > > > > To support the above, I have made a change to the KIP > explaining > > > that > > > > > the > > > > > > > CLI tool is versioned. > > > > > > > The CLI tool internally has knowledge about a map of features > to > > > > their > > > > > > > respective max > > > > > > > versions supported by the Broker. The tool's knowledge of > > features > > > > and > > > > > > > their version values, > > > > > > > is limited to the version of the CLI tool itself i.e. the > > > information > > > > > is > > > > > > > packaged into the CLI tool > > > > > > > when it is released. Whenever a Kafka release introduces a new > > > > feature > > > > > > > version, or modifies > > > > > > > an existing feature version, the CLI tool shall also be updated > > > with > > > > > this > > > > > > > information, > > > > > > > Newer versions of the CLI tool will be released as part of the > > > Kafka > > > > > > > releases. > > > > > > > > > > > > > > Therefore, to achieve the downgrade need, the user just needs > to > > > run > > > > > the > > > > > > > version of > > > > > > > the CLI tool that's part of the particular previous release > that > > > > he/she > > > > > > is > > > > > > > downgrading to. > > > > > > > To help the user with this, there is a new command added to the > > CLI > > > > > tool > > > > > > > called `downgrade-all`. > > > > > > > This essentially downgrades max version levels of all features > in > > > the > > > > > > > cluster to the versions > > > > > > > known to the CLI tool internally. > > > > > > > > > > > > > > I have explained the above in the KIP under these sections: > > > > > > > > > > > > > > Tooling support (have explained that the CLI tool is > versioned): > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport > > > > > > > > > > > > > > Regular CLI tool usage (please refer to point #3, and see the > > > tooling > > > > > > > example) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-RegularCLItoolusage > > > > > > > > > > > > > > > 110. Similarly, if the client library finds a feature > mismatch > > > with > > > > > the > > > > > > > broker, > > > > > > > > the client likely needs to log some error message for the > user > > to > > > > > take > > > > > > > some > > > > > > > > actions. It's much more actionable if the error message is > > > "upgrade > > > > > the > > > > > > > > broker to release version 2.6" than just "upgrade the broker > to > > > > > feature > > > > > > > > version 7". > > > > > > > > > > > > > > (Kowshik): That's a really good point! If we use ints for > feature > > > > > > versions, > > > > > > > the best > > > > > > > message that client can print for debugging is "broker doesn't > > > > support > > > > > > > feature version 7", and alongside that print the supported > > version > > > > > range > > > > > > > returned > > > > > > > by the broker. Then, does it sound reasonable that the user > could > > > > then > > > > > > > reference > > > > > > > Kafka release logs to figure out which version of the broker > > > release > > > > is > > > > > > > required > > > > > > > be deployed, to support feature version 7? I couldn't think of > a > > > > better > > > > > > > strategy here. > > > > > > > > > > > > > > > 120. When should a developer bump up the version of a > feature? > > > > > > > > > > > > > > (Kowshik): Great question! In the KIP, I have added a section: > > > > > > 'Guidelines > > > > > > > on feature versions and workflows' > > > > > > > providing some guidelines on when to use the versioned feature > > > flags, > > > > > and > > > > > > > what > > > > > > > are the regular workflows with the CLI tool. > > > > > > > > > > > > > > Link to the relevant sections: > > > > > > > Guidelines: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guidelinesonfeatureversionsandworkflows > > > > > > > > > > > > > > Regular CLI tool usage: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-RegularCLItoolusage > > > > > > > > > > > > > > Advanced CLI tool usage: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-AdvancedCLItoolusage > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > Kowshik > > > > > > > > > > > > > > > > > > > > > On Fri, Apr 10, 2020 at 4:25 PM Jun Rao <j...@confluent.io> > > wrote: > > > > > > > > > > > > > > > Hi, Kowshik, > > > > > > > > > > > > > > > > Thanks for the reply. A few more comments. > > > > > > > > > > > > > > > > 110. Keeping the feature version as int is probably fine. I > > just > > > > felt > > > > > > > that > > > > > > > > for some of the common user interactions, it's more > convenient > > to > > > > > > > > relate that to a release version. For example, if a user > wants > > to > > > > > > > downgrade > > > > > > > > to a release 2.5, it's easier for the user to use the tool > like > > > > "tool > > > > > > > > --downgrade 2.5" instead of "tool --downgrade --feature X > > > --version > > > > > 6". > > > > > > > > Similarly, if the client library finds a feature mismatch > with > > > the > > > > > > > broker, > > > > > > > > the client likely needs to log some error message for the > user > > to > > > > > take > > > > > > > some > > > > > > > > actions. It's much more actionable if the error message is > > > "upgrade > > > > > the > > > > > > > > broker to release version 2.6" than just "upgrade the broker > to > > > > > feature > > > > > > > > version 7". > > > > > > > > > > > > > > > > 111. Sounds good. > > > > > > > > > > > > > > > > 120. When should a developer bump up the version of a > feature? > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > On Tue, Apr 7, 2020 at 12:26 AM Kowshik Prakasam < > > > > > > kpraka...@confluent.io > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi Jun, > > > > > > > > > > > > > > > > > > I have updated the KIP for the item 111. > > > > > > > > > I'm in the process of addressing 100.6, and will provide an > > > > update > > > > > > > soon. > > > > > > > > > I think item 110 is still under discussion given we are now > > > > > > providing a > > > > > > > > way > > > > > > > > > to finalize > > > > > > > > > all features to their latest version levels. In any case, > > > please > > > > > let > > > > > > us > > > > > > > > > know > > > > > > > > > how you feel in response to Colin's comments on this topic. > > > > > > > > > > > > > > > > > > > 111. To put this in context, when we had IBP, the default > > > value > > > > > is > > > > > > > the > > > > > > > > > > current released version. So, if you are a brand new > user, > > > you > > > > > > don't > > > > > > > > need > > > > > > > > > > to configure IBP and all new features will be immediately > > > > > available > > > > > > > in > > > > > > > > > the > > > > > > > > > > new cluster. If you are upgrading from an old version, > you > > do > > > > > need > > > > > > to > > > > > > > > > > understand and configure IBP. I see a similar pattern > here > > > for > > > > > > > > > > features. From the ease of use perspective, ideally, we > > > > shouldn't > > > > > > > > require > > > > > > > > > a > > > > > > > > > > new user to have an extra step such as running a > bootstrap > > > > script > > > > > > > > unless > > > > > > > > > > it's truly necessary. If someone has a special need (all > > the > > > > > cases > > > > > > > you > > > > > > > > > > mentioned seem special cases?), they can configure a mode > > > such > > > > > that > > > > > > > > > > features are enabled/disabled manually. > > > > > > > > > > > > > > > > > > (Kowshik): That makes sense, thanks for the idea! Sorry if > I > > > > didn't > > > > > > > > > understand > > > > > > > > > this need earlier. I have updated the KIP with the approach > > > that > > > > > > > whenever > > > > > > > > > the '/features' node is absent, the controller by default > > will > > > > > > > bootstrap > > > > > > > > > the node > > > > > > > > > to contain the latest feature levels. Here is the new > section > > > in > > > > > the > > > > > > > KIP > > > > > > > > > describing > > > > > > > > > the same: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Controller:ZKnodebootstrapwithdefaultvalues > > > > > > > > > > > > > > > > > > Next, as I explained in my response to Colin's suggestions, > > we > > > > are > > > > > > now > > > > > > > > > providing a `--finalize-latest-features` flag with the > > tooling. > > > > > This > > > > > > > lets > > > > > > > > > the sysadmin finalize all features known to the controller > to > > > > their > > > > > > > > latest > > > > > > > > > version > > > > > > > > > levels. Please look at this section (point #3 and the > tooling > > > > > example > > > > > > > > > later): > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport > > > > > > > > > > > > > > > > > > > > > > > > > > > Do you feel this addresses your comment/concern? > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > Kowshik > > > > > > > > > > > > > > > > > > On Mon, Apr 6, 2020 at 12:06 PM Jun Rao <j...@confluent.io> > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi, Kowshik, > > > > > > > > > > > > > > > > > > > > Thanks for the reply. A few more replies below. > > > > > > > > > > > > > > > > > > > > 100.6 You can look for the sentence "This operation > > requires > > > > > ALTER > > > > > > on > > > > > > > > > > CLUSTER." in KIP-455. Also, you can check its usage in > > > > > > > > > > KafkaApis.authorize(). > > > > > > > > > > > > > > > > > > > > 110. From the external client/tooling perspective, it's > > more > > > > > > natural > > > > > > > to > > > > > > > > > use > > > > > > > > > > the release version for features. If we can use the same > > > > release > > > > > > > > version > > > > > > > > > > for internal representation, it seems simpler (easier to > > > > > > understand, > > > > > > > no > > > > > > > > > > mapping overhead, etc). Is there a benefit with separate > > > > external > > > > > > and > > > > > > > > > > internal versioning schemes? > > > > > > > > > > > > > > > > > > > > 111. To put this in context, when we had IBP, the default > > > value > > > > > is > > > > > > > the > > > > > > > > > > current released version. So, if you are a brand new > user, > > > you > > > > > > don't > > > > > > > > need > > > > > > > > > > to configure IBP and all new features will be immediately > > > > > available > > > > > > > in > > > > > > > > > the > > > > > > > > > > new cluster. If you are upgrading from an old version, > you > > do > > > > > need > > > > > > to > > > > > > > > > > understand and configure IBP. I see a similar pattern > here > > > for > > > > > > > > > > features. From the ease of use perspective, ideally, we > > > > shouldn't > > > > > > > > > require a > > > > > > > > > > new user to have an extra step such as running a > bootstrap > > > > script > > > > > > > > unless > > > > > > > > > > it's truly necessary. If someone has a special need (all > > the > > > > > cases > > > > > > > you > > > > > > > > > > mentioned seem special cases?), they can configure a mode > > > such > > > > > that > > > > > > > > > > features are enabled/disabled manually. > > > > > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > On Fri, Apr 3, 2020 at 5:54 PM Kowshik Prakasam < > > > > > > > > kpraka...@confluent.io> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hi Jun, > > > > > > > > > > > > > > > > > > > > > > Thanks for the feedback and suggestions. Please find my > > > > > response > > > > > > > > below. > > > > > > > > > > > > > > > > > > > > > > > 100.6 For every new request, the admin needs to > control > > > who > > > > > is > > > > > > > > > allowed > > > > > > > > > > to > > > > > > > > > > > > issue that request if security is enabled. So, we > need > > to > > > > > > assign > > > > > > > > the > > > > > > > > > > new > > > > > > > > > > > > request a ResourceType and possible AclOperations. > See > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment > > > > > > > > > > > > as an example. > > > > > > > > > > > > > > > > > > > > > > (Kowshik): I don't see any reference to the words > > > > ResourceType > > > > > or > > > > > > > > > > > AclOperations > > > > > > > > > > > in the KIP. Please let me know how I can use the KIP > that > > > you > > > > > > > linked > > > > > > > > to > > > > > > > > > > > know how to > > > > > > > > > > > setup the appropriate ResourceType and/or > > ClusterOperation? > > > > > > > > > > > > > > > > > > > > > > > 105. If we change delete to disable, it's better to > do > > > this > > > > > > > > > > consistently > > > > > > > > > > > in > > > > > > > > > > > > request protocol and admin api as well. > > > > > > > > > > > > > > > > > > > > > > (Kowshik): The API shouldn't be called 'disable' when > it > > is > > > > > > > deleting > > > > > > > > a > > > > > > > > > > > feature. > > > > > > > > > > > I've just changed the KIP to use 'delete'. I don't > have a > > > > > strong > > > > > > > > > > > preference. > > > > > > > > > > > > > > > > > > > > > > > 110. The minVersion/maxVersion for features use > int64. > > > > > > Currently, > > > > > > > > our > > > > > > > > > > > > release version schema is major.minor.bugfix (e.g. > > > 2.5.0). > > > > > It's > > > > > > > > > > possible > > > > > > > > > > > > for new features to be included in minor releases > too. > > > > Should > > > > > > we > > > > > > > > make > > > > > > > > > > the > > > > > > > > > > > > feature versioning match the release versioning? > > > > > > > > > > > > > > > > > > > > > > (Kowshik): The release version can be mapped to a set > of > > > > > feature > > > > > > > > > > versions, > > > > > > > > > > > and this can be done, for example in the tool (or even > > > > external > > > > > > to > > > > > > > > the > > > > > > > > > > > tool). > > > > > > > > > > > Can you please clarify what I'm missing? > > > > > > > > > > > > > > > > > > > > > > > 111. "During regular operations, the data in the ZK > > node > > > > can > > > > > be > > > > > > > > > mutated > > > > > > > > > > > > only via a specific admin API served only by the > > > > > controller." I > > > > > > > am > > > > > > > > > > > > wondering why can't the controller auto finalize a > > > feature > > > > > > > version > > > > > > > > > > after > > > > > > > > > > > > all brokers are upgraded? For new users who download > > the > > > > > latest > > > > > > > > > version > > > > > > > > > > > to > > > > > > > > > > > > build a new cluster, it's inconvenient for them to > have > > > to > > > > > > > manually > > > > > > > > > > > enable > > > > > > > > > > > > each feature. > > > > > > > > > > > > > > > > > > > > > > (Kowshik): I agree that there is a trade-off here, but > it > > > > will > > > > > > help > > > > > > > > > > > to decide whether the automation can be thought through > > in > > > > the > > > > > > > future > > > > > > > > > > > in a follow up KIP, or right now in this KIP. We may > > invest > > > > > > > > > > > in automation, but we have to decide whether we should > do > > > it > > > > > > > > > > > now or later. > > > > > > > > > > > > > > > > > > > > > > For the inconvenience that you mentioned, do you think > > the > > > > > > problem > > > > > > > > that > > > > > > > > > > you > > > > > > > > > > > mentioned can be overcome by asking for the cluster > > > operator > > > > > to > > > > > > > run > > > > > > > > a > > > > > > > > > > > bootstrap script when he/she knows that a specific AK > > > > release > > > > > > has > > > > > > > > been > > > > > > > > > > > almost completely deployed in a cluster for the first > > time? > > > > > Idea > > > > > > is > > > > > > > > > that > > > > > > > > > > > the > > > > > > > > > > > bootstrap script will know how to map a specific AK > > release > > > > to > > > > > > > > > finalized > > > > > > > > > > > feature versions, and run the `kafka-features.sh` tool > > > > > > > appropriately > > > > > > > > > > > against > > > > > > > > > > > the cluster. > > > > > > > > > > > > > > > > > > > > > > Now, coming back to your automation proposal/question. > > > > > > > > > > > I do see the value of automated feature version > > > finalization, > > > > > > but I > > > > > > > > > also > > > > > > > > > > > see > > > > > > > > > > > that this will open up several questions and some > risks, > > as > > > > > > > explained > > > > > > > > > > > below. > > > > > > > > > > > The answers to these depend on the definition of the > > > > automation > > > > > > we > > > > > > > > > choose > > > > > > > > > > > to build, and how well does it fit into a kafka > > deployment. > > > > > > > > > > > Basically, it can be unsafe for the controller to > > finalize > > > > > > feature > > > > > > > > > > version > > > > > > > > > > > upgrades automatically, without learning about the > intent > > > of > > > > > the > > > > > > > > > cluster > > > > > > > > > > > operator. > > > > > > > > > > > 1. We would sometimes want to lock feature versions > only > > > when > > > > > we > > > > > > > have > > > > > > > > > > > externally verified > > > > > > > > > > > the stability of the broker binary. > > > > > > > > > > > 2. Sometimes only the cluster operator knows that a > > cluster > > > > > > upgrade > > > > > > > > is > > > > > > > > > > > complete, > > > > > > > > > > > and new brokers are highly unlikely to join the > cluster. > > > > > > > > > > > 3. Only the cluster operator knows that the intent is > to > > > > deploy > > > > > > the > > > > > > > > > same > > > > > > > > > > > version > > > > > > > > > > > of the new broker release across the entire cluster > (i.e. > > > the > > > > > > > latest > > > > > > > > > > > downloaded version). > > > > > > > > > > > 4. For downgrades, it appears the controller still > needs > > > some > > > > > > > > external > > > > > > > > > > > input > > > > > > > > > > > (such as the proposed tool) to finalize a feature > version > > > > > > > downgrade. > > > > > > > > > > > > > > > > > > > > > > If we have automation, that automation can end up > failing > > > in > > > > > some > > > > > > > of > > > > > > > > > the > > > > > > > > > > > cases > > > > > > > > > > > above. Then, we need a way to declare that the cluster > is > > > > "not > > > > > > > ready" > > > > > > > > > if > > > > > > > > > > > the > > > > > > > > > > > controller cannot automatically finalize some basic > > > required > > > > > > > feature > > > > > > > > > > > version > > > > > > > > > > > upgrades across the cluster. We need to make the > cluster > > > > > operator > > > > > > > > aware > > > > > > > > > > in > > > > > > > > > > > such a scenario (raise an alert or alike). > > > > > > > > > > > > > > > > > > > > > > > 112. DeleteFeaturesResponse: It seems the apiKey > should > > > be > > > > 49 > > > > > > > > instead > > > > > > > > > > of > > > > > > > > > > > 48. > > > > > > > > > > > > > > > > > > > > > > (Kowshik): Done. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > Kowshik > > > > > > > > > > > > > > > > > > > > > > On Fri, Apr 3, 2020 at 11:24 AM Jun Rao < > > j...@confluent.io> > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > Hi, Kowshik, > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the reply. A few more comments below. > > > > > > > > > > > > > > > > > > > > > > > > 100.6 For every new request, the admin needs to > control > > > who > > > > > is > > > > > > > > > allowed > > > > > > > > > > to > > > > > > > > > > > > issue that request if security is enabled. So, we > need > > to > > > > > > assign > > > > > > > > the > > > > > > > > > > new > > > > > > > > > > > > request a ResourceType and possible AclOperations. > See > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment > > > > > > > > > > > > as > > > > > > > > > > > > an example. > > > > > > > > > > > > > > > > > > > > > > > > 105. If we change delete to disable, it's better to > do > > > this > > > > > > > > > > consistently > > > > > > > > > > > in > > > > > > > > > > > > request protocol and admin api as well. > > > > > > > > > > > > > > > > > > > > > > > > 110. The minVersion/maxVersion for features use > int64. > > > > > > Currently, > > > > > > > > our > > > > > > > > > > > > release version schema is major.minor.bugfix (e.g. > > > 2.5.0). > > > > > It's > > > > > > > > > > possible > > > > > > > > > > > > for new features to be included in minor releases > too. > > > > Should > > > > > > we > > > > > > > > make > > > > > > > > > > the > > > > > > > > > > > > feature versioning match the release versioning? > > > > > > > > > > > > > > > > > > > > > > > > 111. "During regular operations, the data in the ZK > > node > > > > can > > > > > be > > > > > > > > > mutated > > > > > > > > > > > > only via a specific admin API served only by the > > > > > controller." I > > > > > > > am > > > > > > > > > > > > wondering why can't the controller auto finalize a > > > feature > > > > > > > version > > > > > > > > > > after > > > > > > > > > > > > all brokers are upgraded? For new users who download > > the > > > > > latest > > > > > > > > > version > > > > > > > > > > > to > > > > > > > > > > > > build a new cluster, it's inconvenient for them to > have > > > to > > > > > > > manually > > > > > > > > > > > enable > > > > > > > > > > > > each feature. > > > > > > > > > > > > > > > > > > > > > > > > 112. DeleteFeaturesResponse: It seems the apiKey > should > > > be > > > > 49 > > > > > > > > instead > > > > > > > > > > of > > > > > > > > > > > > 48. > > > > > > > > > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam < > > > > > > > > > > kpraka...@confluent.io> > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Hey Jun, > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks a lot for the great feedback! Please note > that > > > the > > > > > > > design > > > > > > > > > > > > > has changed a little bit on the KIP, and we now > > > propagate > > > > > the > > > > > > > > > > finalized > > > > > > > > > > > > > features metadata only via ZK watches (instead of > > > > > > > > > > UpdateMetadataRequest > > > > > > > > > > > > > from the controller). > > > > > > > > > > > > > > > > > > > > > > > > > > Please find below my response to your > > > questions/feedback, > > > > > > with > > > > > > > > the > > > > > > > > > > > prefix > > > > > > > > > > > > > "(Kowshik):". > > > > > > > > > > > > > > > > > > > > > > > > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse > > > > > > > > > > > > > > 100.1 Since this request waits for responses from > > > > > brokers, > > > > > > > > should > > > > > > > > > > we > > > > > > > > > > > > add > > > > > > > > > > > > > a > > > > > > > > > > > > > > timeout in the request (like createTopicRequest)? > > > > > > > > > > > > > > > > > > > > > > > > > > (Kowshik): Great point! Done. I have added a > timeout > > > > field. > > > > > > > Note: > > > > > > > > > we > > > > > > > > > > no > > > > > > > > > > > > > longer > > > > > > > > > > > > > wait for responses from brokers, since the design > has > > > > been > > > > > > > > changed > > > > > > > > > so > > > > > > > > > > > > that > > > > > > > > > > > > > the > > > > > > > > > > > > > features information is propagated via ZK. > > > Nevertheless, > > > > it > > > > > > is > > > > > > > > > right > > > > > > > > > > to > > > > > > > > > > > > > have a timeout > > > > > > > > > > > > > for the request. > > > > > > > > > > > > > > > > > > > > > > > > > > > 100.2 The response schema is a bit weird. > > Typically, > > > > the > > > > > > > > response > > > > > > > > > > > just > > > > > > > > > > > > > > shows an error code and an error message, instead > > of > > > > > > echoing > > > > > > > > the > > > > > > > > > > > > request. > > > > > > > > > > > > > > > > > > > > > > > > > > (Kowshik): Great point! Yeah, I have modified it to > > > just > > > > > > return > > > > > > > > an > > > > > > > > > > > error > > > > > > > > > > > > > code and a message. > > > > > > > > > > > > > Previously it was not echoing the "request", rather > > it > > > > was > > > > > > > > > returning > > > > > > > > > > > the > > > > > > > > > > > > > latest set of > > > > > > > > > > > > > cluster-wide finalized features (after applying the > > > > > updates). > > > > > > > But > > > > > > > > > you > > > > > > > > > > > are > > > > > > > > > > > > > right, > > > > > > > > > > > > > the additional info is not required, so I have > > removed > > > it > > > > > > from > > > > > > > > the > > > > > > > > > > > > response > > > > > > > > > > > > > schema. > > > > > > > > > > > > > > > > > > > > > > > > > > > 100.3 Should we add a separate request to > > > list/describe > > > > > the > > > > > > > > > > existing > > > > > > > > > > > > > > features? > > > > > > > > > > > > > > > > > > > > > > > > > > (Kowshik): This is already present in the KIP via > the > > > > > > > > > > > 'DescribeFeatures' > > > > > > > > > > > > > Admin API, > > > > > > > > > > > > > which, underneath covers uses the > ApiVersionsRequest > > to > > > > > > > > > list/describe > > > > > > > > > > > the > > > > > > > > > > > > > existing features. Please read the 'Tooling > support' > > > > > section. > > > > > > > > > > > > > > > > > > > > > > > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a > > > > single > > > > > > > > request. > > > > > > > > > > For > > > > > > > > > > > > > > DELETE, the version field doesn't make sense. > So, I > > > > guess > > > > > > the > > > > > > > > > > broker > > > > > > > > > > > > just > > > > > > > > > > > > > > ignores this? An alternative way is to have a > > > separate > > > > > > > > > > > > > DeleteFeaturesRequest > > > > > > > > > > > > > > > > > > > > > > > > > > (Kowshik): Great point! I have modified the KIP now > > to > > > > > have 2 > > > > > > > > > > separate > > > > > > > > > > > > > controller APIs > > > > > > > > > > > > > serving these different purposes: > > > > > > > > > > > > > 1. updateFeatures > > > > > > > > > > > > > 2. deleteFeatures > > > > > > > > > > > > > > > > > > > > > > > > > > > 100.5 In UpdateFeaturesResponse, we have "The > > > > > monotonically > > > > > > > > > > > increasing > > > > > > > > > > > > > > version of the metadata for finalized features." > I > > am > > > > > > > wondering > > > > > > > > > why > > > > > > > > > > > the > > > > > > > > > > > > > > ordering is important? > > > > > > > > > > > > > > > > > > > > > > > > > > (Kowshik): In the latest KIP write-up, it is called > > > epoch > > > > > > > > (instead > > > > > > > > > of > > > > > > > > > > > > > version), and > > > > > > > > > > > > > it is just the ZK node version. Basically, this is > > the > > > > > epoch > > > > > > > for > > > > > > > > > the > > > > > > > > > > > > > cluster-wide > > > > > > > > > > > > > finalized feature version metadata. This metadata > is > > > > served > > > > > > to > > > > > > > > > > clients > > > > > > > > > > > > via > > > > > > > > > > > > > the > > > > > > > > > > > > > ApiVersionsResponse (for reads). We propagate > updates > > > > from > > > > > > the > > > > > > > > > > > > '/features' > > > > > > > > > > > > > ZK node > > > > > > > > > > > > > to all brokers, via ZK watches setup by each broker > > on > > > > the > > > > > > > > > > '/features' > > > > > > > > > > > > > node. > > > > > > > > > > > > > > > > > > > > > > > > > > Now here is why the ordering is important: > > > > > > > > > > > > > ZK watches don't propagate at the same time. As a > > > result, > > > > > the > > > > > > > > > > > > > ApiVersionsResponse > > > > > > > > > > > > > is eventually consistent across brokers. This can > > > > introduce > > > > > > > cases > > > > > > > > > > > > > where clients see an older lower epoch of the > > features > > > > > > > metadata, > > > > > > > > > > after > > > > > > > > > > > a > > > > > > > > > > > > > more recent > > > > > > > > > > > > > higher epoch was returned at a previous point in > > time. > > > We > > > > > > > expect > > > > > > > > > > > clients > > > > > > > > > > > > > to always employ the rule that the latest received > > > higher > > > > > > epoch > > > > > > > > of > > > > > > > > > > > > metadata > > > > > > > > > > > > > always trumps an older smaller epoch. Those clients > > > that > > > > > are > > > > > > > > > external > > > > > > > > > > > to > > > > > > > > > > > > > Kafka should strongly consider discovering the > latest > > > > > > metadata > > > > > > > > once > > > > > > > > > > > > during > > > > > > > > > > > > > startup from the brokers, and if required refresh > the > > > > > > metadata > > > > > > > > > > > > periodically > > > > > > > > > > > > > (to get the latest metadata). > > > > > > > > > > > > > > > > > > > > > > > > > > > 100.6 Could you specify the required ACL for this > > new > > > > > > > request? > > > > > > > > > > > > > > > > > > > > > > > > > > (Kowshik): What is ACL, and how could I find out > > which > > > > one > > > > > to > > > > > > > > > > specify? > > > > > > > > > > > > > Please could you provide me some pointers? I'll be > > glad > > > > to > > > > > > > update > > > > > > > > > the > > > > > > > > > > > > > KIP once I know the next steps. > > > > > > > > > > > > > > > > > > > > > > > > > > > 101. For the broker registration ZK node, should > we > > > > bump > > > > > up > > > > > > > the > > > > > > > > > > > version > > > > > > > > > > > > > in > > > > > > > > > > > > > the json? > > > > > > > > > > > > > > > > > > > > > > > > > > (Kowshik): Great point! Done. I've increased the > > > version > > > > in > > > > > > the > > > > > > > > > > broker > > > > > > > > > > > > json > > > > > > > > > > > > > by 1. > > > > > > > > > > > > > > > > > > > > > > > > > > > 102. For the /features ZK node, not sure if we > need > > > the > > > > > > epoch > > > > > > > > > > field. > > > > > > > > > > > > Each > > > > > > > > > > > > > > ZK node has an internal version field that is > > > > incremented > > > > > > on > > > > > > > > > every > > > > > > > > > > > > > update. > > > > > > > > > > > > > > > > > > > > > > > > > > (Kowshik): Great point! Done. I'm using the ZK node > > > > version > > > > > > > now, > > > > > > > > > > > instead > > > > > > > > > > > > of > > > > > > > > > > > > > explicitly > > > > > > > > > > > > > incremented epoch. > > > > > > > > > > > > > > > > > > > > > > > > > > > 103. "Enabling the actual semantics of a feature > > > > version > > > > > > > > > > cluster-wide > > > > > > > > > > > > is > > > > > > > > > > > > > > left to the discretion of the logic implementing > > the > > > > > > feature > > > > > > > > (ex: > > > > > > > > > > can > > > > > > > > > > > > be > > > > > > > > > > > > > > done via dynamic broker config)." Does that mean > > the > > > > > broker > > > > > > > > > > > > registration > > > > > > > > > > > > > ZK > > > > > > > > > > > > > > node will be updated dynamically when this > happens? > > > > > > > > > > > > > > > > > > > > > > > > > > (Kowshik): Not really. The text was just conveying > > > that a > > > > > > > broker > > > > > > > > > > could > > > > > > > > > > > > > "know" of > > > > > > > > > > > > > a new feature version, but it does not mean the > > broker > > > > > should > > > > > > > > have > > > > > > > > > > also > > > > > > > > > > > > > activated the effects of the feature version. > Knowing > > > vs > > > > > > > > activation > > > > > > > > > > > are 2 > > > > > > > > > > > > > separate things, > > > > > > > > > > > > > and the latter can be achieved by dynamic config. I > > > have > > > > > > > reworded > > > > > > > > > the > > > > > > > > > > > > text > > > > > > > > > > > > > to > > > > > > > > > > > > > make this clear to the reader. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 104. UpdateMetadataRequest > > > > > > > > > > > > > > 104.1 It would be useful to describe when the > > feature > > > > > > > metadata > > > > > > > > is > > > > > > > > > > > > > included > > > > > > > > > > > > > > in the request. My understanding is that it's > only > > > > > included > > > > > > > if > > > > > > > > > (1) > > > > > > > > > > > > there > > > > > > > > > > > > > is > > > > > > > > > > > > > > a change to the finalized feature; (2) broker > > > restart; > > > > > (3) > > > > > > > > > > controller > > > > > > > > > > > > > > failover. > > > > > > > > > > > > > > 104.2 The new fields have the following versions. > > Why > > > > are > > > > > > the > > > > > > > > > > > versions > > > > > > > > > > > > 3+ > > > > > > > > > > > > > > when the top version is bumped to 6? > > > > > > > > > > > > > > "fields": [ > > > > > > > > > > > > > > {"name": "Name", "type": "string", > > > "versions": > > > > > > > "3+", > > > > > > > > > > > > > > "about": "The name of the feature."}, > > > > > > > > > > > > > > {"name": "Version", "type": "int64", > > > > > "versions": > > > > > > > > "3+", > > > > > > > > > > > > > > "about": "The finalized version for the > > > > > > feature."} > > > > > > > > > > > > > > ] > > > > > > > > > > > > > > > > > > > > > > > > > > (Kowshik): With the new improved design, we have > > > > completely > > > > > > > > > > eliminated > > > > > > > > > > > > the > > > > > > > > > > > > > need to > > > > > > > > > > > > > use UpdateMetadataRequest. This is because we now > > rely > > > on > > > > > ZK > > > > > > to > > > > > > > > > > deliver > > > > > > > > > > > > the > > > > > > > > > > > > > notifications for changes to the '/features' ZK > node. > > > > > > > > > > > > > > > > > > > > > > > > > > > 105. kafka-features.sh: Instead of using > > > update/delete, > > > > > > > perhaps > > > > > > > > > > it's > > > > > > > > > > > > > better > > > > > > > > > > > > > > to use enable/disable? > > > > > > > > > > > > > > > > > > > > > > > > > > (Kowshik): For delete, yes, I have changed it so > that > > > we > > > > > > > instead > > > > > > > > > call > > > > > > > > > > > it > > > > > > > > > > > > > 'disable'. > > > > > > > > > > > > > However for 'update', it can now also refer to > either > > > an > > > > > > > upgrade > > > > > > > > > or a > > > > > > > > > > > > > forced downgrade. > > > > > > > > > > > > > Therefore, I have left it the way it is, just > calling > > > it > > > > as > > > > > > > just > > > > > > > > > > > > 'update'. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > Kowshik > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao < > > > > j...@confluent.io> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, Kowshik, > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the KIP. Looks good overall. A few > > > comments > > > > > > below. > > > > > > > > > > > > > > > > > > > > > > > > > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse > > > > > > > > > > > > > > 100.1 Since this request waits for responses from > > > > > brokers, > > > > > > > > should > > > > > > > > > > we > > > > > > > > > > > > add > > > > > > > > > > > > > a > > > > > > > > > > > > > > timeout in the request (like createTopicRequest)? > > > > > > > > > > > > > > 100.2 The response schema is a bit weird. > > Typically, > > > > the > > > > > > > > response > > > > > > > > > > > just > > > > > > > > > > > > > > shows an error code and an error message, instead > > of > > > > > > echoing > > > > > > > > the > > > > > > > > > > > > request. > > > > > > > > > > > > > > 100.3 Should we add a separate request to > > > list/describe > > > > > the > > > > > > > > > > existing > > > > > > > > > > > > > > features? > > > > > > > > > > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a > > > > single > > > > > > > > request. > > > > > > > > > > For > > > > > > > > > > > > > > DELETE, the version field doesn't make sense. > So, I > > > > guess > > > > > > the > > > > > > > > > > broker > > > > > > > > > > > > just > > > > > > > > > > > > > > ignores this? An alternative way is to have a > > > separate > > > > > > > > > > > > > > DeleteFeaturesRequest > > > > > > > > > > > > > > 100.5 In UpdateFeaturesResponse, we have "The > > > > > monotonically > > > > > > > > > > > increasing > > > > > > > > > > > > > > version of the metadata for finalized features." > I > > am > > > > > > > wondering > > > > > > > > > why > > > > > > > > > > > the > > > > > > > > > > > > > > ordering is important? > > > > > > > > > > > > > > 100.6 Could you specify the required ACL for this > > new > > > > > > > request? > > > > > > > > > > > > > > > > > > > > > > > > > > > > 101. For the broker registration ZK node, should > we > > > > bump > > > > > up > > > > > > > the > > > > > > > > > > > version > > > > > > > > > > > > > in > > > > > > > > > > > > > > the json? > > > > > > > > > > > > > > > > > > > > > > > > > > > > 102. For the /features ZK node, not sure if we > need > > > the > > > > > > epoch > > > > > > > > > > field. > > > > > > > > > > > > Each > > > > > > > > > > > > > > ZK node has an internal version field that is > > > > incremented > > > > > > on > > > > > > > > > every > > > > > > > > > > > > > update. > > > > > > > > > > > > > > > > > > > > > > > > > > > > 103. "Enabling the actual semantics of a feature > > > > version > > > > > > > > > > cluster-wide > > > > > > > > > > > > is > > > > > > > > > > > > > > left to the discretion of the logic implementing > > the > > > > > > feature > > > > > > > > (ex: > > > > > > > > > > can > > > > > > > > > > > > be > > > > > > > > > > > > > > done via dynamic broker config)." Does that mean > > the > > > > > broker > > > > > > > > > > > > registration > > > > > > > > > > > > > ZK > > > > > > > > > > > > > > node will be updated dynamically when this > happens? > > > > > > > > > > > > > > > > > > > > > > > > > > > > 104. UpdateMetadataRequest > > > > > > > > > > > > > > 104.1 It would be useful to describe when the > > feature > > > > > > > metadata > > > > > > > > is > > > > > > > > > > > > > included > > > > > > > > > > > > > > in the request. My understanding is that it's > only > > > > > included > > > > > > > if > > > > > > > > > (1) > > > > > > > > > > > > there > > > > > > > > > > > > > is > > > > > > > > > > > > > > a change to the finalized feature; (2) broker > > > restart; > > > > > (3) > > > > > > > > > > controller > > > > > > > > > > > > > > failover. > > > > > > > > > > > > > > 104.2 The new fields have the following versions. > > Why > > > > are > > > > > > the > > > > > > > > > > > versions > > > > > > > > > > > > 3+ > > > > > > > > > > > > > > when the top version is bumped to 6? > > > > > > > > > > > > > > "fields": [ > > > > > > > > > > > > > > {"name": "Name", "type": "string", > > > "versions": > > > > > > > "3+", > > > > > > > > > > > > > > "about": "The name of the feature."}, > > > > > > > > > > > > > > {"name": "Version", "type": "int64", > > > > > "versions": > > > > > > > > "3+", > > > > > > > > > > > > > > "about": "The finalized version for the > > > > > > feature."} > > > > > > > > > > > > > > ] > > > > > > > > > > > > > > > > > > > > > > > > > > > > 105. kafka-features.sh: Instead of using > > > update/delete, > > > > > > > perhaps > > > > > > > > > > it's > > > > > > > > > > > > > better > > > > > > > > > > > > > > to use enable/disable? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam > < > > > > > > > > > > > > kpraka...@confluent.io > > > > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hey Boyang, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the great feedback! I have updated > the > > > KIP > > > > > > based > > > > > > > > on > > > > > > > > > > your > > > > > > > > > > > > > > > feedback. > > > > > > > > > > > > > > > Please find my response below for your > comments, > > > look > > > > > for > > > > > > > > > > sentences > > > > > > > > > > > > > > > starting > > > > > > > > > > > > > > > with "(Kowshik)" below. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. "When is it safe for the brokers to begin > > > > handling > > > > > > EOS > > > > > > > > > > > traffic" > > > > > > > > > > > > > > could > > > > > > > > > > > > > > > be > > > > > > > > > > > > > > > > converted as "When is it safe for the brokers > > to > > > > > start > > > > > > > > > serving > > > > > > > > > > > new > > > > > > > > > > > > > > > > Exactly-Once(EOS) semantics" since EOS is not > > > > > explained > > > > > > > > > earlier > > > > > > > > > > > in > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > context. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > (Kowshik): Great point! Done. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. In the *Explanation *section, the metadata > > > > version > > > > > > > > number > > > > > > > > > > part > > > > > > > > > > > > > > seems a > > > > > > > > > > > > > > > > bit blurred. Could you point a reference to > > later > > > > > > section > > > > > > > > > that > > > > > > > > > > we > > > > > > > > > > > > > going > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > store it in Zookeeper and update it every > time > > > when > > > > > > there > > > > > > > > is > > > > > > > > > a > > > > > > > > > > > > > feature > > > > > > > > > > > > > > > > change? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > (Kowshik): Great point! Done. I've added a > > > reference > > > > in > > > > > > the > > > > > > > > > KIP. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3. For the feature downgrade, although it's a > > > > > Non-goal > > > > > > of > > > > > > > > the > > > > > > > > > > > KIP, > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > features such as group coordinator semantics, > > > there > > > > > is > > > > > > no > > > > > > > > > legal > > > > > > > > > > > > > > scenario > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > perform a downgrade at all. So having > downgrade > > > > door > > > > > > open > > > > > > > > is > > > > > > > > > > > pretty > > > > > > > > > > > > > > > > error-prone as human faults happen all the > > time. > > > > I'm > > > > > > > > assuming > > > > > > > > > > as > > > > > > > > > > > > new > > > > > > > > > > > > > > > > features are implemented, it's not very hard > to > > > > add a > > > > > > > flag > > > > > > > > > > during > > > > > > > > > > > > > > feature > > > > > > > > > > > > > > > > creation to indicate whether this feature is > > > > > > > > "downgradable". > > > > > > > > > > > Could > > > > > > > > > > > > > you > > > > > > > > > > > > > > > > explain a bit more on the extra engineering > > > effort > > > > > for > > > > > > > > > shipping > > > > > > > > > > > > this > > > > > > > > > > > > > > KIP > > > > > > > > > > > > > > > > with downgrade protection in place? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > (Kowshik): Great point! I'd agree and disagree > > > here. > > > > > > While > > > > > > > I > > > > > > > > > > agree > > > > > > > > > > > > that > > > > > > > > > > > > > > > accidental > > > > > > > > > > > > > > > downgrades can cause problems, I also think > > > sometimes > > > > > > > > > downgrades > > > > > > > > > > > > should > > > > > > > > > > > > > > > be allowed for emergency reasons (not all > > > downgrades > > > > > > cause > > > > > > > > > > issues). > > > > > > > > > > > > > > > It is just subjective to the feature being > > > > downgraded. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > To be more strict about feature version > > > downgrades, I > > > > > > have > > > > > > > > > > modified > > > > > > > > > > > > the > > > > > > > > > > > > > > KIP > > > > > > > > > > > > > > > proposing that we mandate a `--force-downgrade` > > > flag > > > > be > > > > > > > used > > > > > > > > in > > > > > > > > > > the > > > > > > > > > > > > > > > UPDATE_FEATURES api > > > > > > > > > > > > > > > and the tooling, whenever the human is > > downgrading > > > a > > > > > > > > finalized > > > > > > > > > > > > feature > > > > > > > > > > > > > > > version. > > > > > > > > > > > > > > > Hopefully this should cover the requirement, > > until > > > we > > > > > > find > > > > > > > > the > > > > > > > > > > need > > > > > > > > > > > > for > > > > > > > > > > > > > > > advanced downgrade support. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 4. "Each broker’s supported dictionary of > > feature > > > > > > > versions > > > > > > > > > will > > > > > > > > > > > be > > > > > > > > > > > > > > > defined > > > > > > > > > > > > > > > > in the broker code." So this means in order > to > > > > > > restrict a > > > > > > > > > > certain > > > > > > > > > > > > > > > feature, > > > > > > > > > > > > > > > > we need to start the broker first and then > > send a > > > > > > feature > > > > > > > > > > gating > > > > > > > > > > > > > > request > > > > > > > > > > > > > > > > immediately, which introduces a time gap and > > the > > > > > > > > > > > intended-to-close > > > > > > > > > > > > > > > feature > > > > > > > > > > > > > > > > could actually serve request during this > phase. > > > Do > > > > > you > > > > > > > > think > > > > > > > > > we > > > > > > > > > > > > > should > > > > > > > > > > > > > > > also > > > > > > > > > > > > > > > > support configurations as well so that admin > > user > > > > > could > > > > > > > > > freely > > > > > > > > > > > roll > > > > > > > > > > > > > up > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > cluster with all nodes complying the same > > feature > > > > > > gating, > > > > > > > > > > without > > > > > > > > > > > > > > > worrying > > > > > > > > > > > > > > > > about the turnaround time to propagate the > > > message > > > > > only > > > > > > > > after > > > > > > > > > > the > > > > > > > > > > > > > > cluster > > > > > > > > > > > > > > > > starts up? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > (Kowshik): This is a great point/question. One > of > > > the > > > > > > > > > > expectations > > > > > > > > > > > > out > > > > > > > > > > > > > of > > > > > > > > > > > > > > > this KIP, which is > > > > > > > > > > > > > > > already followed in the broker, is the > following. > > > > > > > > > > > > > > > - Imagine at time T1 the broker starts up and > > > > > registers > > > > > > > it’s > > > > > > > > > > > > presence > > > > > > > > > > > > > in > > > > > > > > > > > > > > > ZK, > > > > > > > > > > > > > > > along with advertising it’s supported > > features. > > > > > > > > > > > > > > > - Imagine at a future time T2 the broker > > receives > > > > the > > > > > > > > > > > > > > > UpdateMetadataRequest > > > > > > > > > > > > > > > from the controller, which contains the > latest > > > > > > finalized > > > > > > > > > > > features > > > > > > > > > > > > as > > > > > > > > > > > > > > > seen by > > > > > > > > > > > > > > > the controller. The broker validates this > data > > > > > against > > > > > > > > it’s > > > > > > > > > > > > > supported > > > > > > > > > > > > > > > features to > > > > > > > > > > > > > > > make sure there is no mismatch (it will > > shutdown > > > > if > > > > > > > there > > > > > > > > is > > > > > > > > > > an > > > > > > > > > > > > > > > incompatibility). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > It is expected that during the time between > the 2 > > > > > events > > > > > > T1 > > > > > > > > and > > > > > > > > > > T2, > > > > > > > > > > > > the > > > > > > > > > > > > > > > broker is > > > > > > > > > > > > > > > almost a silent entity in the cluster. It does > > not > > > > add > > > > > > any > > > > > > > > > value > > > > > > > > > > to > > > > > > > > > > > > the > > > > > > > > > > > > > > > cluster, or carry > > > > > > > > > > > > > > > out any important broker activities. By > > > “important”, > > > > I > > > > > > mean > > > > > > > > it > > > > > > > > > is > > > > > > > > > > > not > > > > > > > > > > > > > > doing > > > > > > > > > > > > > > > mutations > > > > > > > > > > > > > > > on it’s persistence, not mutating critical > > > in-memory > > > > > > state, > > > > > > > > > won’t > > > > > > > > > > > be > > > > > > > > > > > > > > > serving > > > > > > > > > > > > > > > produce/fetch requests. Note it doesn’t even > know > > > > it’s > > > > > > > > assigned > > > > > > > > > > > > > > partitions > > > > > > > > > > > > > > > until > > > > > > > > > > > > > > > it receives UpdateMetadataRequest from > > controller. > > > > > > Anything > > > > > > > > the > > > > > > > > > > > > broker > > > > > > > > > > > > > is > > > > > > > > > > > > > > > doing up > > > > > > > > > > > > > > > until this point is not damaging/useful. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I’ve clarified the above in the KIP, see this > new > > > > > > section: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime > > > > > > > > > > > > > > > . > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 5. "adding a new Feature, updating or > deleting > > an > > > > > > > existing > > > > > > > > > > > > Feature", > > > > > > > > > > > > > > may > > > > > > > > > > > > > > > be > > > > > > > > > > > > > > > > I misunderstood something, I thought the > > features > > > > are > > > > > > > > defined > > > > > > > > > > in > > > > > > > > > > > > > broker > > > > > > > > > > > > > > > > code, so admin could not really create a new > > > > feature? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > (Kowshik): Great point! You understood this > > right. > > > > Here > > > > > > > > adding > > > > > > > > > a > > > > > > > > > > > > > feature > > > > > > > > > > > > > > > means we are > > > > > > > > > > > > > > > adding a cluster-wide finalized *max* version > > for a > > > > > > feature > > > > > > > > > that > > > > > > > > > > > was > > > > > > > > > > > > > > > previously never finalized. > > > > > > > > > > > > > > > I have clarified this in the KIP now. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 6. I think we need a separate error code like > > > > > > > > > > > > > > FEATURE_UPDATE_IN_PROGRESS > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > reject a concurrent feature update request. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > (Kowshik): Great point! I have modified the KIP > > > > adding > > > > > > the > > > > > > > > > above > > > > > > > > > > > (see > > > > > > > > > > > > > > > 'Tooling support -> Admin API changes'). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 7. I think we haven't discussed the > alternative > > > > > > solution > > > > > > > to > > > > > > > > > > pass > > > > > > > > > > > > the > > > > > > > > > > > > > > > > feature information through Zookeeper. Is > that > > > > > > mentioned > > > > > > > in > > > > > > > > > the > > > > > > > > > > > KIP > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > justify why using UpdateMetadata is more > > > favorable? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > (Kowshik): Nice question! The broker reads > > > finalized > > > > > > > feature > > > > > > > > > info > > > > > > > > > > > > > stored > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > ZK, > > > > > > > > > > > > > > > only during startup when it does a validation. > > When > > > > > > serving > > > > > > > > > > > > > > > `ApiVersionsRequest`, the > > > > > > > > > > > > > > > broker does not read this info from ZK > directly. > > > I'd > > > > > > > imagine > > > > > > > > > the > > > > > > > > > > > risk > > > > > > > > > > > > > is > > > > > > > > > > > > > > > that it can increase > > > > > > > > > > > > > > > the ZK read QPS which can be a bottleneck for > the > > > > > system. > > > > > > > > > Today, > > > > > > > > > > in > > > > > > > > > > > > > Kafka > > > > > > > > > > > > > > > we use the > > > > > > > > > > > > > > > controller to fan out ZK updates to brokers and > > we > > > > want > > > > > > to > > > > > > > > > stick > > > > > > > > > > to > > > > > > > > > > > > > that > > > > > > > > > > > > > > > pattern to avoid > > > > > > > > > > > > > > > the ZK read bottleneck when serving > > > > > `ApiVersionsRequest`. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 8. I was under the impression that user could > > > > > > configure a > > > > > > > > > range > > > > > > > > > > > of > > > > > > > > > > > > > > > > supported versions, what's the trade-off for > > > > allowing > > > > > > > > single > > > > > > > > > > > > > finalized > > > > > > > > > > > > > > > > version only? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > (Kowshik): Great question! The finalized > version > > > of a > > > > > > > feature > > > > > > > > > > > > basically > > > > > > > > > > > > > > > refers to > > > > > > > > > > > > > > > the cluster-wide finalized feature "maximum" > > > version. > > > > > For > > > > > > > > > > example, > > > > > > > > > > > if > > > > > > > > > > > > > the > > > > > > > > > > > > > > > 'group_coordinator' feature > > > > > > > > > > > > > > > has the finalized version set to 10, then, it > > means > > > > > that > > > > > > > > > > > cluster-wide > > > > > > > > > > > > > all > > > > > > > > > > > > > > > versions upto v10 are > > > > > > > > > > > > > > > supported for this feature. However, note that > if > > > > some > > > > > > > > version > > > > > > > > > > (ex: > > > > > > > > > > > > v0) > > > > > > > > > > > > > > > gets deprecated > > > > > > > > > > > > > > > for this feature, then we don’t convey that > using > > > > this > > > > > > > scheme > > > > > > > > > > (also > > > > > > > > > > > > > > > supporting deprecation is a non-goal). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > (Kowshik): I’ve now modified the KIP at all > > points, > > > > > > > refering > > > > > > > > to > > > > > > > > > > > > > finalized > > > > > > > > > > > > > > > feature "maximum" versions. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 9. One minor syntax fix: Note that here the > > > > "client" > > > > > > here > > > > > > > > may > > > > > > > > > > be > > > > > > > > > > > a > > > > > > > > > > > > > > > producer > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > (Kowshik): Great point! Done. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > > Kowshik > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen < > > > > > > > > > > > > > reluctanthero...@gmail.com> > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hey Kowshik, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > thanks for the revised KIP. Got a couple of > > > > > questions: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. "When is it safe for the brokers to begin > > > > handling > > > > > > EOS > > > > > > > > > > > traffic" > > > > > > > > > > > > > > could > > > > > > > > > > > > > > > be > > > > > > > > > > > > > > > > converted as "When is it safe for the brokers > > to > > > > > start > > > > > > > > > serving > > > > > > > > > > > new > > > > > > > > > > > > > > > > Exactly-Once(EOS) semantics" since EOS is not > > > > > explained > > > > > > > > > earlier > > > > > > > > > > > in > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > context. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. In the *Explanation *section, the metadata > > > > version > > > > > > > > number > > > > > > > > > > part > > > > > > > > > > > > > > seems a > > > > > > > > > > > > > > > > bit blurred. Could you point a reference to > > later > > > > > > section > > > > > > > > > that > > > > > > > > > > we > > > > > > > > > > > > > going > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > store it in Zookeeper and update it every > time > > > when > > > > > > there > > > > > > > > is > > > > > > > > > a > > > > > > > > > > > > > feature > > > > > > > > > > > > > > > > change? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3. For the feature downgrade, although it's a > > > > > Non-goal > > > > > > of > > > > > > > > the > > > > > > > > > > > KIP, > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > features such as group coordinator semantics, > > > there > > > > > is > > > > > > no > > > > > > > > > legal > > > > > > > > > > > > > > scenario > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > perform a downgrade at all. So having > downgrade > > > > door > > > > > > open > > > > > > > > is > > > > > > > > > > > pretty > > > > > > > > > > > > > > > > error-prone as human faults happen all the > > time. > > > > I'm > > > > > > > > assuming > > > > > > > > > > as > > > > > > > > > > > > new > > > > > > > > > > > > > > > > features are implemented, it's not very hard > to > > > > add a > > > > > > > flag > > > > > > > > > > during > > > > > > > > > > > > > > feature > > > > > > > > > > > > > > > > creation to indicate whether this feature is > > > > > > > > "downgradable". > > > > > > > > > > > Could > > > > > > > > > > > > > you > > > > > > > > > > > > > > > > explain a bit more on the extra engineering > > > effort > > > > > for > > > > > > > > > shipping > > > > > > > > > > > > this > > > > > > > > > > > > > > KIP > > > > > > > > > > > > > > > > with downgrade protection in place? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 4. "Each broker’s supported dictionary of > > feature > > > > > > > versions > > > > > > > > > will > > > > > > > > > > > be > > > > > > > > > > > > > > > defined > > > > > > > > > > > > > > > > in the broker code." So this means in order > to > > > > > > restrict a > > > > > > > > > > certain > > > > > > > > > > > > > > > feature, > > > > > > > > > > > > > > > > we need to start the broker first and then > > send a > > > > > > feature > > > > > > > > > > gating > > > > > > > > > > > > > > request > > > > > > > > > > > > > > > > immediately, which introduces a time gap and > > the > > > > > > > > > > > intended-to-close > > > > > > > > > > > > > > > feature > > > > > > > > > > > > > > > > could actually serve request during this > phase. > > > Do > > > > > you > > > > > > > > think > > > > > > > > > we > > > > > > > > > > > > > should > > > > > > > > > > > > > > > also > > > > > > > > > > > > > > > > support configurations as well so that admin > > user > > > > > could > > > > > > > > > freely > > > > > > > > > > > roll > > > > > > > > > > > > > up > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > cluster with all nodes complying the same > > feature > > > > > > gating, > > > > > > > > > > without > > > > > > > > > > > > > > > worrying > > > > > > > > > > > > > > > > about the turnaround time to propagate the > > > message > > > > > only > > > > > > > > after > > > > > > > > > > the > > > > > > > > > > > > > > cluster > > > > > > > > > > > > > > > > starts up? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 5. "adding a new Feature, updating or > deleting > > an > > > > > > > existing > > > > > > > > > > > > Feature", > > > > > > > > > > > > > > may > > > > > > > > > > > > > > > be > > > > > > > > > > > > > > > > I misunderstood something, I thought the > > features > > > > are > > > > > > > > defined > > > > > > > > > > in > > > > > > > > > > > > > broker > > > > > > > > > > > > > > > > code, so admin could not really create a new > > > > feature? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 6. I think we need a separate error code like > > > > > > > > > > > > > > FEATURE_UPDATE_IN_PROGRESS > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > reject a concurrent feature update request. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 7. I think we haven't discussed the > alternative > > > > > > solution > > > > > > > to > > > > > > > > > > pass > > > > > > > > > > > > the > > > > > > > > > > > > > > > > feature information through Zookeeper. Is > that > > > > > > mentioned > > > > > > > in > > > > > > > > > the > > > > > > > > > > > KIP > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > justify why using UpdateMetadata is more > > > favorable? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 8. I was under the impression that user could > > > > > > configure a > > > > > > > > > range > > > > > > > > > > > of > > > > > > > > > > > > > > > > supported versions, what's the trade-off for > > > > allowing > > > > > > > > single > > > > > > > > > > > > > finalized > > > > > > > > > > > > > > > > version only? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 9. One minor syntax fix: Note that here the > > > > "client" > > > > > > here > > > > > > > > may > > > > > > > > > > be > > > > > > > > > > > a > > > > > > > > > > > > > > > producer > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Boyang > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe > < > > > > > > > > > > cmcc...@apache.org > > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik > > > Prakasam > > > > > > wrote: > > > > > > > > > > > > > > > > > > Hi Colin, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the feedback! I've changed the > > KIP > > > > to > > > > > > > > address > > > > > > > > > > your > > > > > > > > > > > > > > > > > > suggestions. > > > > > > > > > > > > > > > > > > Please find below my explanation. Here > is a > > > > link > > > > > to > > > > > > > KIP > > > > > > > > > > 584: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features > > > > > > > > > > > > > > > > > > . > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. '__data_version__' is the version of > the > > > > > > finalized > > > > > > > > > > feature > > > > > > > > > > > > > > > metadata > > > > > > > > > > > > > > > > > > (i.e. actual ZK node contents), while the > > > > > > > > > > > '__schema_version__' > > > > > > > > > > > > is > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > version of the schema of the data > persisted > > > in > > > > > ZK. > > > > > > > > These > > > > > > > > > > > serve > > > > > > > > > > > > > > > > different > > > > > > > > > > > > > > > > > > purposes. '__data_version__' is is useful > > > > mainly > > > > > to > > > > > > > > > clients > > > > > > > > > > > > > during > > > > > > > > > > > > > > > > reads, > > > > > > > > > > > > > > > > > > to differentiate between the 2 versions > of > > > > > > eventually > > > > > > > > > > > > consistent > > > > > > > > > > > > > > > > > 'finalized > > > > > > > > > > > > > > > > > > features' metadata (i.e. larger metadata > > > > version > > > > > is > > > > > > > > more > > > > > > > > > > > > recent). > > > > > > > > > > > > > > > > > > '__schema_version__' provides an > additional > > > > > degree > > > > > > of > > > > > > > > > > > > > flexibility, > > > > > > > > > > > > > > > > where > > > > > > > > > > > > > > > > > if > > > > > > > > > > > > > > > > > > we decide to change the schema for > > > '/features' > > > > > node > > > > > > > in > > > > > > > > ZK > > > > > > > > > > (in > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > future), > > > > > > > > > > > > > > > > > > then we can manage broker roll outs > > suitably > > > > > (i.e. > > > > > > > > > > > > > > > > > > serialization/deserialization of the ZK > > data > > > > can > > > > > be > > > > > > > > > handled > > > > > > > > > > > > > > safely). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Kowshik, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If you're talking about a number that lets > > you > > > > know > > > > > > if > > > > > > > > data > > > > > > > > > > is > > > > > > > > > > > > more > > > > > > > > > > > > > > or > > > > > > > > > > > > > > > > > less recent, we would typically call that > an > > > > epoch, > > > > > > and > > > > > > > > > not a > > > > > > > > > > > > > > version. > > > > > > > > > > > > > > > > For > > > > > > > > > > > > > > > > > the ZK data structures, the word "version" > is > > > > > > typically > > > > > > > > > > > reserved > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > > describing changes to the overall schema of > > the > > > > > data > > > > > > > that > > > > > > > > > is > > > > > > > > > > > > > written > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > ZooKeeper. We don't even really change the > > > > > "version" > > > > > > > of > > > > > > > > > > those > > > > > > > > > > > > > > schemas > > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > > much, since most changes are > > > > backwards-compatible. > > > > > > But > > > > > > > > we > > > > > > > > > do > > > > > > > > > > > > > include > > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > > version field just in case. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I don't think we really need an epoch here, > > > > though, > > > > > > > since > > > > > > > > > we > > > > > > > > > > > can > > > > > > > > > > > > > just > > > > > > > > > > > > > > > > look > > > > > > > > > > > > > > > > > at the broker epoch. Whenever the broker > > > > > registers, > > > > > > > its > > > > > > > > > > epoch > > > > > > > > > > > > will > > > > > > > > > > > > > > be > > > > > > > > > > > > > > > > > greater than the previous broker epoch. > And > > > the > > > > > > newly > > > > > > > > > > > registered > > > > > > > > > > > > > > data > > > > > > > > > > > > > > > > will > > > > > > > > > > > > > > > > > take priority. This will be a lot simpler > > than > > > > > > adding > > > > > > > a > > > > > > > > > > > separate > > > > > > > > > > > > > > epoch > > > > > > > > > > > > > > > > > system, I think. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. Regarding admin client needing min and > > max > > > > > > > > > information - > > > > > > > > > > > you > > > > > > > > > > > > > are > > > > > > > > > > > > > > > > > right! > > > > > > > > > > > > > > > > > > I've changed the KIP such that the Admin > > API > > > > also > > > > > > > > allows > > > > > > > > > > the > > > > > > > > > > > > user > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > read > > > > > > > > > > > > > > > > > > 'supported features' from a specific > > broker. > > > > > Please > > > > > > > > look > > > > > > > > > at > > > > > > > > > > > the > > > > > > > > > > > > > > > section > > > > > > > > > > > > > > > > > > "Admin API changes". > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3. Regarding the use of `long` vs `Long` > - > > it > > > > was > > > > > > not > > > > > > > > > > > > deliberate. > > > > > > > > > > > > > > > I've > > > > > > > > > > > > > > > > > > improved the KIP to just use `long` at > all > > > > > places. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Sounds good. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 4. Regarding kafka.admin.FeatureCommand > > tool > > > - > > > > > you > > > > > > > are > > > > > > > > > > right! > > > > > > > > > > > > > I've > > > > > > > > > > > > > > > > > updated > > > > > > > > > > > > > > > > > > the KIP sketching the functionality > > provided > > > by > > > > > > this > > > > > > > > > tool, > > > > > > > > > > > with > > > > > > > > > > > > > > some > > > > > > > > > > > > > > > > > > examples. Please look at the section > > "Tooling > > > > > > support > > > > > > > > > > > > examples". > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, Kowshik. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > cheers, > > > > > > > > > > > > > > > > > Colin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > > > > > Kowshik > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin > > > McCabe < > > > > > > > > > > > > > cmcc...@apache.org> > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, Kowshik, this looks good. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > In the "Schema" section, do we really > > need > > > > both > > > > > > > > > > > > > > __schema_version__ > > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > > > __data_version__? Can we just have a > > > single > > > > > > > version > > > > > > > > > > field > > > > > > > > > > > > > here? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Shouldn't the Admin(Client) function > have > > > > some > > > > > > way > > > > > > > to > > > > > > > > > get > > > > > > > > > > > the > > > > > > > > > > > > > min > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > max > > > > > > > > > > > > > > > > > > > information that we're exposing as > > well? I > > > > > guess > > > > > > > we > > > > > > > > > > could > > > > > > > > > > > > have > > > > > > > > > > > > > > > min, > > > > > > > > > > > > > > > > > max, > > > > > > > > > > > > > > > > > > > and current. Unrelated: is the use of > > Long > > > > > > rather > > > > > > > > than > > > > > > > > > > > long > > > > > > > > > > > > > > > > deliberate > > > > > > > > > > > > > > > > > > > here? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > It would be good to describe how the > > > command > > > > > line > > > > > > > > tool > > > > > > > > > > > > > > > > > > > kafka.admin.FeatureCommand will work. > > For > > > > > > example > > > > > > > > the > > > > > > > > > > > flags > > > > > > > > > > > > > that > > > > > > > > > > > > > > > it > > > > > > > > > > > > > > > > > will > > > > > > > > > > > > > > > > > > > take and the output that it will > generate > > > to > > > > > > > STDOUT. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > cheers, > > > > > > > > > > > > > > > > > > > Colin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik > > > > > Prakasam > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I've opened KIP-584 > > > > > > > > > > > > > > <https://issues.apache.org/jira/browse/KIP-584> > < > > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/KIP-584 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > which > > > > > > > > > > > > > > > > > > > > is intended to provide a versioning > > > scheme > > > > > for > > > > > > > > > > features. > > > > > > > > > > > > I'd > > > > > > > > > > > > > > like > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > use > > > > > > > > > > > > > > > > > > > > this thread to discuss the same. I'd > > > > > appreciate > > > > > > > any > > > > > > > > > > > > feedback > > > > > > > > > > > > > on > > > > > > > > > > > > > > > > this. > > > > > > > > > > > > > > > > > > > > Here > > > > > > > > > > > > > > > > > > > > is a link to KIP-584 > > > > > > > > > > > > > > <https://issues.apache.org/jira/browse/KIP-584>: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features > > > > > > > > > > > > > > > > > > > > . > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > > > > > > > Kowshik > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >