Hi Kowshik, Thanks for the KIP, this is exciting!
The KIP includes examples on how operators could use the command line utility, etc. It would be great to add some high-level details on how the upgrade workflow changes overall with the addition of feature versions. - Dhruvil On Wed, Apr 15, 2020 at 6:29 PM Kowshik Prakasam <kpraka...@confluent.io> wrote: > Hi Jun, > > Sorry the links were broken in my last response, here are the right links: > > 200. > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioning > Scheme For Features-Validations > < > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Validations > > > 110. > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-When > To Use Versioned Feature Flags? > < > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Whentouseversionedfeatureflags > ?> > > > Cheers, > Kowshik > > On Wed, Apr 15, 2020 at 6:24 PM Kowshik Prakasam <kpraka...@confluent.io> > wrote: > > > > > Hi Jun, > > > > Thanks for the feedback! I have addressed the comments in the KIP. > > > > > 200. In the validation section, there is still the text "*from* > > > {"max_version_level": > > > X} *to* {"max_version_level": X’}". It seems that it should say "from X > > to > > > Y"? > > > > (Kowshik): Done. I have reworded it a bit to make it clearer now in this > > section: > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Validations > > > > > 110. Could we add that we need to document the bumped version of each > > > feature in the upgrade section of a release? > > > > (Kowshik): Great point! Done, I have mentioned it in #3 this section: > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584 > > <https://issues.apache.org/jira/browse/KIP-584> > > %3A+Versioning+scheme+for+features#KIP-584 > > <https://issues.apache.org/jira/browse/KIP-584> > > :Versioningschemeforfeatures-Whentouseversionedfeatureflags? > > > > > > Cheers, > > Kowshik > > > > On Wed, Apr 15, 2020 at 4:00 PM Jun Rao <j...@confluent.io> wrote: > > > >> Hi, Kowshik, > >> > >> Looks good to me now. Just a couple of minor things below. > >> > >> 200. In the validation section, there is still the text "*from* > >> {"max_version_level": > >> X} *to* {"max_version_level": X’}". It seems that it should say "from X > to > >> Y"? > >> > >> 110. Could we add that we need to document the bumped version of each > >> feature in the upgrade section of a release? > >> > >> Thanks, > >> > >> Jun > >> > >> On Wed, Apr 15, 2020 at 1:08 PM Kowshik Prakasam < > kpraka...@confluent.io> > >> wrote: > >> > >> > Hi Jun, > >> > > >> > Thank you for the suggestion! I have updated the KIP, please find my > >> > response below. > >> > > >> > > 200. I guess you are saying only when the allowDowngrade field is > set, > >> > the > >> > > finalized feature version can go backward. Otherwise, it can only go > >> up. > >> > > That makes sense. It would be useful to make that clear when > >> explaining > >> > > the usage of the allowDowngrade field. In the validation section, we > >> > have " > >> > > /features' from {"max_version_level": X} to {"max_version_level": > >> X’}", > >> > it > >> > > seems that we need to mention Y there. > >> > > >> > (Kowshik): Great point! Yes, that is correct. Done, I have updated the > >> > validations > >> > section explaining the above. Here is a link to this section: > >> > > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Validations > >> > > >> > > >> > Cheers, > >> > Kowshik > >> > > >> > > >> > > >> > > >> > On Wed, Apr 15, 2020 at 11:05 AM Jun Rao <j...@confluent.io> wrote: > >> > > >> > > Hi, Kowshik, > >> > > > >> > > 200. I guess you are saying only when the allowDowngrade field is > set, > >> > the > >> > > finalized feature version can go backward. Otherwise, it can only go > >> up. > >> > > That makes sense. It would be useful to make that clear when > >> explaining > >> > > the usage of the allowDowngrade field. In the validation section, we > >> have > >> > > " > >> > > /features' from {"max_version_level": X} to {"max_version_level": > >> X’}", > >> > it > >> > > seems that we need to mention Y there. > >> > > > >> > > Thanks, > >> > > > >> > > Jun > >> > > > >> > > On Wed, Apr 15, 2020 at 10:44 AM Kowshik Prakasam < > >> > kpraka...@confluent.io> > >> > > wrote: > >> > > > >> > > > Hi Jun, > >> > > > > >> > > > Great question! Please find my response below. > >> > > > > >> > > > > 200. My understanding is that If the CLI tool passes the > >> > > > > '--allow-downgrade' flag when updating a specific feature, then > a > >> > > future > >> > > > > downgrade is possible. Otherwise, the feature is now > >> downgradable. If > >> > > so, > >> > > > I > >> > > > > was wondering how the controller remembers this since it can be > >> > > restarted > >> > > > > over time? > >> > > > > >> > > > (Kowshik): The purpose of the flag was to just restrict the user > >> intent > >> > > for > >> > > > a specific request. > >> > > > It seems to me that to avoid confusion, I could call the flag as > >> > > > `--try-downgrade` instead. > >> > > > Then this makes it clear, that, the controller just has to > consider > >> the > >> > > ask > >> > > > from > >> > > > the user as an explicit request to attempt a downgrade. > >> > > > > >> > > > The flag does not act as an override on controller's decision > making > >> > that > >> > > > decides whether > >> > > > a flag is downgradable (these decisions on whether to allow a flag > >> to > >> > be > >> > > > downgraded > >> > > > from a specific version level, can be embedded in the controller > >> code). > >> > > > > >> > > > Please let me know what you think. > >> > > > Sorry if I misunderstood the original question. > >> > > > > >> > > > > >> > > > Cheers, > >> > > > Kowshik > >> > > > > >> > > > > >> > > > On Wed, Apr 15, 2020 at 9:40 AM Jun Rao <j...@confluent.io> wrote: > >> > > > > >> > > > > Hi, Kowshik, > >> > > > > > >> > > > > Thanks for the reply. Makes sense. Just one more question. > >> > > > > > >> > > > > 200. My understanding is that If the CLI tool passes the > >> > > > > '--allow-downgrade' flag when updating a specific feature, then > a > >> > > future > >> > > > > downgrade is possible. Otherwise, the feature is now > >> downgradable. If > >> > > > so, I > >> > > > > was wondering how the controller remembers this since it can be > >> > > restarted > >> > > > > over time? > >> > > > > > >> > > > > Jun > >> > > > > > >> > > > > > >> > > > > On Tue, Apr 14, 2020 at 6:49 PM Kowshik Prakasam < > >> > > kpraka...@confluent.io > >> > > > > > >> > > > > wrote: > >> > > > > > >> > > > > > Hi Jun, > >> > > > > > > >> > > > > > Thanks a lot for the feedback and the questions! > >> > > > > > Please find my response below. > >> > > > > > > >> > > > > > > 200. The UpdateFeaturesRequest includes an AllowDowngrade > >> field. > >> > It > >> > > > > seems > >> > > > > > > that field needs to be persisted somewhere in ZK? > >> > > > > > > >> > > > > > (Kowshik): Great question! Below is my explanation. Please > help > >> me > >> > > > > > understand, > >> > > > > > if you feel there are cases where we would need to still > >> persist it > >> > > in > >> > > > > ZK. > >> > > > > > > >> > > > > > Firstly I have updated my thoughts into the KIP now, under the > >> > > > > 'guidelines' > >> > > > > > section: > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guidelinesonfeatureversionsandworkflows > >> > > > > > > >> > > > > > The allowDowngrade boolean field is just to restrict the user > >> > intent, > >> > > > and > >> > > > > > to remind > >> > > > > > them to double check their intent before proceeding. It should > >> be > >> > set > >> > > > to > >> > > > > > true > >> > > > > > by the user in a request, only when the user intent is to > >> > forcefully > >> > > > > > "attempt" a > >> > > > > > downgrade of a specific feature's max version level, to the > >> > provided > >> > > > > value > >> > > > > > in > >> > > > > > the request. > >> > > > > > > >> > > > > > We can extend this safeguard. The controller (on it's end) can > >> > > maintain > >> > > > > > rules in the code, that, for safety reasons would outright > >> reject > >> > > > certain > >> > > > > > downgrades > >> > > > > > from a specific max_version_level for a specific feature. Such > >> > > > rejections > >> > > > > > may > >> > > > > > happen depending on the feature being downgraded, and from > what > >> > > version > >> > > > > > level. > >> > > > > > > >> > > > > > The CLI tool only allows a downgrade attempt in conjunction > with > >> > > > specific > >> > > > > > flags and sub-commands. For example, in the CLI tool, if the > >> user > >> > > uses > >> > > > > the > >> > > > > > 'downgrade-all' command, or passes '--allow-downgrade' flag > when > >> > > > > updating a > >> > > > > > specific feature, only then the tool will translate this ask > to > >> > > setting > >> > > > > > 'allowDowngrade' field in the request to the server. > >> > > > > > > >> > > > > > > 201. UpdateFeaturesResponse has the following top level > >> fields. > >> > > > Should > >> > > > > > > those fields be per feature? > >> > > > > > > > >> > > > > > > "fields": [ > >> > > > > > > { "name": "ErrorCode", "type": "int16", "versions": > "0+", > >> > > > > > > "about": "The error code, or 0 if there was no error." > >> }, > >> > > > > > > { "name": "ErrorMessage", "type": "string", "versions": > >> "0+", > >> > > > > > > "about": "The error message, or null if there was no > >> > error." > >> > > } > >> > > > > > > ] > >> > > > > > > >> > > > > > (Kowshik): Great question! > >> > > > > > As such, the API is transactional, as explained in the > sections > >> > > linked > >> > > > > > below. > >> > > > > > Either all provided FeatureUpdate was applied, or none. > >> > > > > > It's the reason I felt we can have just one error code + > >> message. > >> > > > > > Happy to extend this if you feel otherwise. Please let me > know. > >> > > > > > > >> > > > > > Link to sections: > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-ChangestoKafkaController > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guarantees > >> > > > > > > >> > > > > > > 202. The /features path in ZK has a field min_version_level. > >> > Which > >> > > > API > >> > > > > > and > >> > > > > > > tool can change that value? > >> > > > > > > >> > > > > > (Kowshik): Great question! Currently this cannot be modified > by > >> > using > >> > > > the > >> > > > > > API or the tool. > >> > > > > > Feature version deprecation (by raising min_version_level) can > >> be > >> > > done > >> > > > > only > >> > > > > > by the Controller directly. The rationale is explained in this > >> > > section: > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Featureversiondeprecation > >> > > > > > > >> > > > > > > >> > > > > > Cheers, > >> > > > > > Kowshik > >> > > > > > > >> > > > > > On Tue, Apr 14, 2020 at 5:33 PM Jun Rao <j...@confluent.io> > >> wrote: > >> > > > > > > >> > > > > > > Hi, Kowshik, > >> > > > > > > > >> > > > > > > Thanks for addressing those comments. Just a few more minor > >> > > comments. > >> > > > > > > > >> > > > > > > 200. The UpdateFeaturesRequest includes an AllowDowngrade > >> field. > >> > It > >> > > > > seems > >> > > > > > > that field needs to be persisted somewhere in ZK? > >> > > > > > > > >> > > > > > > 201. UpdateFeaturesResponse has the following top level > >> fields. > >> > > > Should > >> > > > > > > those fields be per feature? > >> > > > > > > > >> > > > > > > "fields": [ > >> > > > > > > { "name": "ErrorCode", "type": "int16", "versions": > "0+", > >> > > > > > > "about": "The error code, or 0 if there was no error." > >> }, > >> > > > > > > { "name": "ErrorMessage", "type": "string", "versions": > >> "0+", > >> > > > > > > "about": "The error message, or null if there was no > >> > error." > >> > > } > >> > > > > > > ] > >> > > > > > > > >> > > > > > > 202. The /features path in ZK has a field min_version_level. > >> > Which > >> > > > API > >> > > > > > and > >> > > > > > > tool can change that value? > >> > > > > > > > >> > > > > > > Jun > >> > > > > > > > >> > > > > > > On Mon, Apr 13, 2020 at 5:12 PM Kowshik Prakasam < > >> > > > > kpraka...@confluent.io > >> > > > > > > > >> > > > > > > wrote: > >> > > > > > > > >> > > > > > > > Hi Jun, > >> > > > > > > > > >> > > > > > > > Thanks for the feedback! I have updated the KIP-584 > >> addressing > >> > > your > >> > > > > > > > comments. > >> > > > > > > > Please find my response below. > >> > > > > > > > > >> > > > > > > > > 100.6 You can look for the sentence "This operation > >> requires > >> > > > ALTER > >> > > > > on > >> > > > > > > > > CLUSTER." in KIP-455. Also, you can check its usage in > >> > > > > > > > > KafkaApis.authorize(). > >> > > > > > > > > >> > > > > > > > (Kowshik): Done. Great point! For the newly introduced > >> > > > > UPDATE_FEATURES > >> > > > > > > api, > >> > > > > > > > I have added a > >> > > > > > > > requirement that AclOperation.ALTER is required on > >> > > > > > ResourceType.CLUSTER. > >> > > > > > > > > >> > > > > > > > > 110. Keeping the feature version as int is probably > fine. > >> I > >> > > just > >> > > > > felt > >> > > > > > > > that > >> > > > > > > > > for some of the common user interactions, it's more > >> > convenient > >> > > to > >> > > > > > > > > relate that to a release version. For example, if a user > >> > wants > >> > > to > >> > > > > > > > downgrade > >> > > > > > > > > to a release 2.5, it's easier for the user to use the > tool > >> > like > >> > > > > "tool > >> > > > > > > > > --downgrade 2.5" instead of "tool --downgrade --feature > X > >> > > > --version > >> > > > > > 6". > >> > > > > > > > > >> > > > > > > > (Kowshik): Great point. Generally, maximum feature version > >> > levels > >> > > > are > >> > > > > > not > >> > > > > > > > downgradable after > >> > > > > > > > they are finalized in the cluster. This is because, as a > >> > > guideline > >> > > > > > > bumping > >> > > > > > > > feature version level usually is used mainly to convey > >> > important > >> > > > > > breaking > >> > > > > > > > changes. > >> > > > > > > > Despite the above, there may be some extreme/rare cases > >> where a > >> > > > user > >> > > > > > > wants > >> > > > > > > > to downgrade > >> > > > > > > > all features to a specific previous release. The user may > >> want > >> > to > >> > > > do > >> > > > > > this > >> > > > > > > > just > >> > > > > > > > prior to rolling back a Kafka cluster to a previous > release. > >> > > > > > > > > >> > > > > > > > To support the above, I have made a change to the KIP > >> > explaining > >> > > > that > >> > > > > > the > >> > > > > > > > CLI tool is versioned. > >> > > > > > > > The CLI tool internally has knowledge about a map of > >> features > >> > to > >> > > > > their > >> > > > > > > > respective max > >> > > > > > > > versions supported by the Broker. The tool's knowledge of > >> > > features > >> > > > > and > >> > > > > > > > their version values, > >> > > > > > > > is limited to the version of the CLI tool itself i.e. the > >> > > > information > >> > > > > > is > >> > > > > > > > packaged into the CLI tool > >> > > > > > > > when it is released. Whenever a Kafka release introduces a > >> new > >> > > > > feature > >> > > > > > > > version, or modifies > >> > > > > > > > an existing feature version, the CLI tool shall also be > >> updated > >> > > > with > >> > > > > > this > >> > > > > > > > information, > >> > > > > > > > Newer versions of the CLI tool will be released as part of > >> the > >> > > > Kafka > >> > > > > > > > releases. > >> > > > > > > > > >> > > > > > > > Therefore, to achieve the downgrade need, the user just > >> needs > >> > to > >> > > > run > >> > > > > > the > >> > > > > > > > version of > >> > > > > > > > the CLI tool that's part of the particular previous > release > >> > that > >> > > > > he/she > >> > > > > > > is > >> > > > > > > > downgrading to. > >> > > > > > > > To help the user with this, there is a new command added > to > >> the > >> > > CLI > >> > > > > > tool > >> > > > > > > > called `downgrade-all`. > >> > > > > > > > This essentially downgrades max version levels of all > >> features > >> > in > >> > > > the > >> > > > > > > > cluster to the versions > >> > > > > > > > known to the CLI tool internally. > >> > > > > > > > > >> > > > > > > > I have explained the above in the KIP under these > sections: > >> > > > > > > > > >> > > > > > > > Tooling support (have explained that the CLI tool is > >> > versioned): > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport > >> > > > > > > > > >> > > > > > > > Regular CLI tool usage (please refer to point #3, and see > >> the > >> > > > tooling > >> > > > > > > > example) > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-RegularCLItoolusage > >> > > > > > > > > >> > > > > > > > > 110. Similarly, if the client library finds a feature > >> > mismatch > >> > > > with > >> > > > > > the > >> > > > > > > > broker, > >> > > > > > > > > the client likely needs to log some error message for > the > >> > user > >> > > to > >> > > > > > take > >> > > > > > > > some > >> > > > > > > > > actions. It's much more actionable if the error message > is > >> > > > "upgrade > >> > > > > > the > >> > > > > > > > > broker to release version 2.6" than just "upgrade the > >> broker > >> > to > >> > > > > > feature > >> > > > > > > > > version 7". > >> > > > > > > > > >> > > > > > > > (Kowshik): That's a really good point! If we use ints for > >> > feature > >> > > > > > > versions, > >> > > > > > > > the best > >> > > > > > > > message that client can print for debugging is "broker > >> doesn't > >> > > > > support > >> > > > > > > > feature version 7", and alongside that print the supported > >> > > version > >> > > > > > range > >> > > > > > > > returned > >> > > > > > > > by the broker. Then, does it sound reasonable that the > user > >> > could > >> > > > > then > >> > > > > > > > reference > >> > > > > > > > Kafka release logs to figure out which version of the > broker > >> > > > release > >> > > > > is > >> > > > > > > > required > >> > > > > > > > be deployed, to support feature version 7? I couldn't > think > >> of > >> > a > >> > > > > better > >> > > > > > > > strategy here. > >> > > > > > > > > >> > > > > > > > > 120. When should a developer bump up the version of a > >> > feature? > >> > > > > > > > > >> > > > > > > > (Kowshik): Great question! In the KIP, I have added a > >> section: > >> > > > > > > 'Guidelines > >> > > > > > > > on feature versions and workflows' > >> > > > > > > > providing some guidelines on when to use the versioned > >> feature > >> > > > flags, > >> > > > > > and > >> > > > > > > > what > >> > > > > > > > are the regular workflows with the CLI tool. > >> > > > > > > > > >> > > > > > > > Link to the relevant sections: > >> > > > > > > > Guidelines: > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guidelinesonfeatureversionsandworkflows > >> > > > > > > > > >> > > > > > > > Regular CLI tool usage: > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-RegularCLItoolusage > >> > > > > > > > > >> > > > > > > > Advanced CLI tool usage: > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-AdvancedCLItoolusage > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > Cheers, > >> > > > > > > > Kowshik > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > On Fri, Apr 10, 2020 at 4:25 PM Jun Rao <j...@confluent.io > > > >> > > wrote: > >> > > > > > > > > >> > > > > > > > > Hi, Kowshik, > >> > > > > > > > > > >> > > > > > > > > Thanks for the reply. A few more comments. > >> > > > > > > > > > >> > > > > > > > > 110. Keeping the feature version as int is probably > fine. > >> I > >> > > just > >> > > > > felt > >> > > > > > > > that > >> > > > > > > > > for some of the common user interactions, it's more > >> > convenient > >> > > to > >> > > > > > > > > relate that to a release version. For example, if a user > >> > wants > >> > > to > >> > > > > > > > downgrade > >> > > > > > > > > to a release 2.5, it's easier for the user to use the > tool > >> > like > >> > > > > "tool > >> > > > > > > > > --downgrade 2.5" instead of "tool --downgrade --feature > X > >> > > > --version > >> > > > > > 6". > >> > > > > > > > > Similarly, if the client library finds a feature > mismatch > >> > with > >> > > > the > >> > > > > > > > broker, > >> > > > > > > > > the client likely needs to log some error message for > the > >> > user > >> > > to > >> > > > > > take > >> > > > > > > > some > >> > > > > > > > > actions. It's much more actionable if the error message > is > >> > > > "upgrade > >> > > > > > the > >> > > > > > > > > broker to release version 2.6" than just "upgrade the > >> broker > >> > to > >> > > > > > feature > >> > > > > > > > > version 7". > >> > > > > > > > > > >> > > > > > > > > 111. Sounds good. > >> > > > > > > > > > >> > > > > > > > > 120. When should a developer bump up the version of a > >> > feature? > >> > > > > > > > > > >> > > > > > > > > Jun > >> > > > > > > > > > >> > > > > > > > > On Tue, Apr 7, 2020 at 12:26 AM Kowshik Prakasam < > >> > > > > > > kpraka...@confluent.io > >> > > > > > > > > > >> > > > > > > > > wrote: > >> > > > > > > > > > >> > > > > > > > > > Hi Jun, > >> > > > > > > > > > > >> > > > > > > > > > I have updated the KIP for the item 111. > >> > > > > > > > > > I'm in the process of addressing 100.6, and will > >> provide an > >> > > > > update > >> > > > > > > > soon. > >> > > > > > > > > > I think item 110 is still under discussion given we > are > >> now > >> > > > > > > providing a > >> > > > > > > > > way > >> > > > > > > > > > to finalize > >> > > > > > > > > > all features to their latest version levels. In any > >> case, > >> > > > please > >> > > > > > let > >> > > > > > > us > >> > > > > > > > > > know > >> > > > > > > > > > how you feel in response to Colin's comments on this > >> topic. > >> > > > > > > > > > > >> > > > > > > > > > > 111. To put this in context, when we had IBP, the > >> default > >> > > > value > >> > > > > > is > >> > > > > > > > the > >> > > > > > > > > > > current released version. So, if you are a brand new > >> > user, > >> > > > you > >> > > > > > > don't > >> > > > > > > > > need > >> > > > > > > > > > > to configure IBP and all new features will be > >> immediately > >> > > > > > available > >> > > > > > > > in > >> > > > > > > > > > the > >> > > > > > > > > > > new cluster. If you are upgrading from an old > version, > >> > you > >> > > do > >> > > > > > need > >> > > > > > > to > >> > > > > > > > > > > understand and configure IBP. I see a similar > pattern > >> > here > >> > > > for > >> > > > > > > > > > > features. From the ease of use perspective, ideally, > >> we > >> > > > > shouldn't > >> > > > > > > > > require > >> > > > > > > > > > a > >> > > > > > > > > > > new user to have an extra step such as running a > >> > bootstrap > >> > > > > script > >> > > > > > > > > unless > >> > > > > > > > > > > it's truly necessary. If someone has a special need > >> (all > >> > > the > >> > > > > > cases > >> > > > > > > > you > >> > > > > > > > > > > mentioned seem special cases?), they can configure a > >> mode > >> > > > such > >> > > > > > that > >> > > > > > > > > > > features are enabled/disabled manually. > >> > > > > > > > > > > >> > > > > > > > > > (Kowshik): That makes sense, thanks for the idea! > Sorry > >> if > >> > I > >> > > > > didn't > >> > > > > > > > > > understand > >> > > > > > > > > > this need earlier. I have updated the KIP with the > >> approach > >> > > > that > >> > > > > > > > whenever > >> > > > > > > > > > the '/features' node is absent, the controller by > >> default > >> > > will > >> > > > > > > > bootstrap > >> > > > > > > > > > the node > >> > > > > > > > > > to contain the latest feature levels. Here is the new > >> > section > >> > > > in > >> > > > > > the > >> > > > > > > > KIP > >> > > > > > > > > > describing > >> > > > > > > > > > the same: > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Controller:ZKnodebootstrapwithdefaultvalues > >> > > > > > > > > > > >> > > > > > > > > > Next, as I explained in my response to Colin's > >> suggestions, > >> > > we > >> > > > > are > >> > > > > > > now > >> > > > > > > > > > providing a `--finalize-latest-features` flag with the > >> > > tooling. > >> > > > > > This > >> > > > > > > > lets > >> > > > > > > > > > the sysadmin finalize all features known to the > >> controller > >> > to > >> > > > > their > >> > > > > > > > > latest > >> > > > > > > > > > version > >> > > > > > > > > > levels. Please look at this section (point #3 and the > >> > tooling > >> > > > > > example > >> > > > > > > > > > later): > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > Do you feel this addresses your comment/concern? > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > Cheers, > >> > > > > > > > > > Kowshik > >> > > > > > > > > > > >> > > > > > > > > > On Mon, Apr 6, 2020 at 12:06 PM Jun Rao < > >> j...@confluent.io> > >> > > > > wrote: > >> > > > > > > > > > > >> > > > > > > > > > > Hi, Kowshik, > >> > > > > > > > > > > > >> > > > > > > > > > > Thanks for the reply. A few more replies below. > >> > > > > > > > > > > > >> > > > > > > > > > > 100.6 You can look for the sentence "This operation > >> > > requires > >> > > > > > ALTER > >> > > > > > > on > >> > > > > > > > > > > CLUSTER." in KIP-455. Also, you can check its usage > in > >> > > > > > > > > > > KafkaApis.authorize(). > >> > > > > > > > > > > > >> > > > > > > > > > > 110. From the external client/tooling perspective, > >> it's > >> > > more > >> > > > > > > natural > >> > > > > > > > to > >> > > > > > > > > > use > >> > > > > > > > > > > the release version for features. If we can use the > >> same > >> > > > > release > >> > > > > > > > > version > >> > > > > > > > > > > for internal representation, it seems simpler > (easier > >> to > >> > > > > > > understand, > >> > > > > > > > no > >> > > > > > > > > > > mapping overhead, etc). Is there a benefit with > >> separate > >> > > > > external > >> > > > > > > and > >> > > > > > > > > > > internal versioning schemes? > >> > > > > > > > > > > > >> > > > > > > > > > > 111. To put this in context, when we had IBP, the > >> default > >> > > > value > >> > > > > > is > >> > > > > > > > the > >> > > > > > > > > > > current released version. So, if you are a brand new > >> > user, > >> > > > you > >> > > > > > > don't > >> > > > > > > > > need > >> > > > > > > > > > > to configure IBP and all new features will be > >> immediately > >> > > > > > available > >> > > > > > > > in > >> > > > > > > > > > the > >> > > > > > > > > > > new cluster. If you are upgrading from an old > version, > >> > you > >> > > do > >> > > > > > need > >> > > > > > > to > >> > > > > > > > > > > understand and configure IBP. I see a similar > pattern > >> > here > >> > > > for > >> > > > > > > > > > > features. From the ease of use perspective, ideally, > >> we > >> > > > > shouldn't > >> > > > > > > > > > require a > >> > > > > > > > > > > new user to have an extra step such as running a > >> > bootstrap > >> > > > > script > >> > > > > > > > > unless > >> > > > > > > > > > > it's truly necessary. If someone has a special need > >> (all > >> > > the > >> > > > > > cases > >> > > > > > > > you > >> > > > > > > > > > > mentioned seem special cases?), they can configure a > >> mode > >> > > > such > >> > > > > > that > >> > > > > > > > > > > features are enabled/disabled manually. > >> > > > > > > > > > > > >> > > > > > > > > > > Jun > >> > > > > > > > > > > > >> > > > > > > > > > > On Fri, Apr 3, 2020 at 5:54 PM Kowshik Prakasam < > >> > > > > > > > > kpraka...@confluent.io> > >> > > > > > > > > > > wrote: > >> > > > > > > > > > > > >> > > > > > > > > > > > Hi Jun, > >> > > > > > > > > > > > > >> > > > > > > > > > > > Thanks for the feedback and suggestions. Please > >> find my > >> > > > > > response > >> > > > > > > > > below. > >> > > > > > > > > > > > > >> > > > > > > > > > > > > 100.6 For every new request, the admin needs to > >> > control > >> > > > who > >> > > > > > is > >> > > > > > > > > > allowed > >> > > > > > > > > > > to > >> > > > > > > > > > > > > issue that request if security is enabled. So, > we > >> > need > >> > > to > >> > > > > > > assign > >> > > > > > > > > the > >> > > > > > > > > > > new > >> > > > > > > > > > > > > request a ResourceType and possible > AclOperations. > >> > See > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment > >> > > > > > > > > > > > > as an example. > >> > > > > > > > > > > > > >> > > > > > > > > > > > (Kowshik): I don't see any reference to the words > >> > > > > ResourceType > >> > > > > > or > >> > > > > > > > > > > > AclOperations > >> > > > > > > > > > > > in the KIP. Please let me know how I can use the > KIP > >> > that > >> > > > you > >> > > > > > > > linked > >> > > > > > > > > to > >> > > > > > > > > > > > know how to > >> > > > > > > > > > > > setup the appropriate ResourceType and/or > >> > > ClusterOperation? > >> > > > > > > > > > > > > >> > > > > > > > > > > > > 105. If we change delete to disable, it's better > >> to > >> > do > >> > > > this > >> > > > > > > > > > > consistently > >> > > > > > > > > > > > in > >> > > > > > > > > > > > > request protocol and admin api as well. > >> > > > > > > > > > > > > >> > > > > > > > > > > > (Kowshik): The API shouldn't be called 'disable' > >> when > >> > it > >> > > is > >> > > > > > > > deleting > >> > > > > > > > > a > >> > > > > > > > > > > > feature. > >> > > > > > > > > > > > I've just changed the KIP to use 'delete'. I don't > >> > have a > >> > > > > > strong > >> > > > > > > > > > > > preference. > >> > > > > > > > > > > > > >> > > > > > > > > > > > > 110. The minVersion/maxVersion for features use > >> > int64. > >> > > > > > > Currently, > >> > > > > > > > > our > >> > > > > > > > > > > > > release version schema is major.minor.bugfix > (e.g. > >> > > > 2.5.0). > >> > > > > > It's > >> > > > > > > > > > > possible > >> > > > > > > > > > > > > for new features to be included in minor > releases > >> > too. > >> > > > > Should > >> > > > > > > we > >> > > > > > > > > make > >> > > > > > > > > > > the > >> > > > > > > > > > > > > feature versioning match the release versioning? > >> > > > > > > > > > > > > >> > > > > > > > > > > > (Kowshik): The release version can be mapped to a > >> set > >> > of > >> > > > > > feature > >> > > > > > > > > > > versions, > >> > > > > > > > > > > > and this can be done, for example in the tool (or > >> even > >> > > > > external > >> > > > > > > to > >> > > > > > > > > the > >> > > > > > > > > > > > tool). > >> > > > > > > > > > > > Can you please clarify what I'm missing? > >> > > > > > > > > > > > > >> > > > > > > > > > > > > 111. "During regular operations, the data in the > >> ZK > >> > > node > >> > > > > can > >> > > > > > be > >> > > > > > > > > > mutated > >> > > > > > > > > > > > > only via a specific admin API served only by the > >> > > > > > controller." I > >> > > > > > > > am > >> > > > > > > > > > > > > wondering why can't the controller auto > finalize a > >> > > > feature > >> > > > > > > > version > >> > > > > > > > > > > after > >> > > > > > > > > > > > > all brokers are upgraded? For new users who > >> download > >> > > the > >> > > > > > latest > >> > > > > > > > > > version > >> > > > > > > > > > > > to > >> > > > > > > > > > > > > build a new cluster, it's inconvenient for them > to > >> > have > >> > > > to > >> > > > > > > > manually > >> > > > > > > > > > > > enable > >> > > > > > > > > > > > > each feature. > >> > > > > > > > > > > > > >> > > > > > > > > > > > (Kowshik): I agree that there is a trade-off here, > >> but > >> > it > >> > > > > will > >> > > > > > > help > >> > > > > > > > > > > > to decide whether the automation can be thought > >> through > >> > > in > >> > > > > the > >> > > > > > > > future > >> > > > > > > > > > > > in a follow up KIP, or right now in this KIP. We > may > >> > > invest > >> > > > > > > > > > > > in automation, but we have to decide whether we > >> should > >> > do > >> > > > it > >> > > > > > > > > > > > now or later. > >> > > > > > > > > > > > > >> > > > > > > > > > > > For the inconvenience that you mentioned, do you > >> think > >> > > the > >> > > > > > > problem > >> > > > > > > > > that > >> > > > > > > > > > > you > >> > > > > > > > > > > > mentioned can be overcome by asking for the > cluster > >> > > > operator > >> > > > > > to > >> > > > > > > > run > >> > > > > > > > > a > >> > > > > > > > > > > > bootstrap script when he/she knows that a > specific > >> AK > >> > > > > release > >> > > > > > > has > >> > > > > > > > > been > >> > > > > > > > > > > > almost completely deployed in a cluster for the > >> first > >> > > time? > >> > > > > > Idea > >> > > > > > > is > >> > > > > > > > > > that > >> > > > > > > > > > > > the > >> > > > > > > > > > > > bootstrap script will know how to map a specific > AK > >> > > release > >> > > > > to > >> > > > > > > > > > finalized > >> > > > > > > > > > > > feature versions, and run the `kafka-features.sh` > >> tool > >> > > > > > > > appropriately > >> > > > > > > > > > > > against > >> > > > > > > > > > > > the cluster. > >> > > > > > > > > > > > > >> > > > > > > > > > > > Now, coming back to your automation > >> proposal/question. > >> > > > > > > > > > > > I do see the value of automated feature version > >> > > > finalization, > >> > > > > > > but I > >> > > > > > > > > > also > >> > > > > > > > > > > > see > >> > > > > > > > > > > > that this will open up several questions and some > >> > risks, > >> > > as > >> > > > > > > > explained > >> > > > > > > > > > > > below. > >> > > > > > > > > > > > The answers to these depend on the definition of > the > >> > > > > automation > >> > > > > > > we > >> > > > > > > > > > choose > >> > > > > > > > > > > > to build, and how well does it fit into a kafka > >> > > deployment. > >> > > > > > > > > > > > Basically, it can be unsafe for the controller to > >> > > finalize > >> > > > > > > feature > >> > > > > > > > > > > version > >> > > > > > > > > > > > upgrades automatically, without learning about the > >> > intent > >> > > > of > >> > > > > > the > >> > > > > > > > > > cluster > >> > > > > > > > > > > > operator. > >> > > > > > > > > > > > 1. We would sometimes want to lock feature > versions > >> > only > >> > > > when > >> > > > > > we > >> > > > > > > > have > >> > > > > > > > > > > > externally verified > >> > > > > > > > > > > > the stability of the broker binary. > >> > > > > > > > > > > > 2. Sometimes only the cluster operator knows that > a > >> > > cluster > >> > > > > > > upgrade > >> > > > > > > > > is > >> > > > > > > > > > > > complete, > >> > > > > > > > > > > > and new brokers are highly unlikely to join the > >> > cluster. > >> > > > > > > > > > > > 3. Only the cluster operator knows that the intent > >> is > >> > to > >> > > > > deploy > >> > > > > > > the > >> > > > > > > > > > same > >> > > > > > > > > > > > version > >> > > > > > > > > > > > of the new broker release across the entire > cluster > >> > (i.e. > >> > > > the > >> > > > > > > > latest > >> > > > > > > > > > > > downloaded version). > >> > > > > > > > > > > > 4. For downgrades, it appears the controller still > >> > needs > >> > > > some > >> > > > > > > > > external > >> > > > > > > > > > > > input > >> > > > > > > > > > > > (such as the proposed tool) to finalize a feature > >> > version > >> > > > > > > > downgrade. > >> > > > > > > > > > > > > >> > > > > > > > > > > > If we have automation, that automation can end up > >> > failing > >> > > > in > >> > > > > > some > >> > > > > > > > of > >> > > > > > > > > > the > >> > > > > > > > > > > > cases > >> > > > > > > > > > > > above. Then, we need a way to declare that the > >> cluster > >> > is > >> > > > > "not > >> > > > > > > > ready" > >> > > > > > > > > > if > >> > > > > > > > > > > > the > >> > > > > > > > > > > > controller cannot automatically finalize some > basic > >> > > > required > >> > > > > > > > feature > >> > > > > > > > > > > > version > >> > > > > > > > > > > > upgrades across the cluster. We need to make the > >> > cluster > >> > > > > > operator > >> > > > > > > > > aware > >> > > > > > > > > > > in > >> > > > > > > > > > > > such a scenario (raise an alert or alike). > >> > > > > > > > > > > > > >> > > > > > > > > > > > > 112. DeleteFeaturesResponse: It seems the apiKey > >> > should > >> > > > be > >> > > > > 49 > >> > > > > > > > > instead > >> > > > > > > > > > > of > >> > > > > > > > > > > > 48. > >> > > > > > > > > > > > > >> > > > > > > > > > > > (Kowshik): Done. > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > Cheers, > >> > > > > > > > > > > > Kowshik > >> > > > > > > > > > > > > >> > > > > > > > > > > > On Fri, Apr 3, 2020 at 11:24 AM Jun Rao < > >> > > j...@confluent.io> > >> > > > > > > wrote: > >> > > > > > > > > > > > > >> > > > > > > > > > > > > Hi, Kowshik, > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > Thanks for the reply. A few more comments below. > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > 100.6 For every new request, the admin needs to > >> > control > >> > > > who > >> > > > > > is > >> > > > > > > > > > allowed > >> > > > > > > > > > > to > >> > > > > > > > > > > > > issue that request if security is enabled. So, > we > >> > need > >> > > to > >> > > > > > > assign > >> > > > > > > > > the > >> > > > > > > > > > > new > >> > > > > > > > > > > > > request a ResourceType and possible > AclOperations. > >> > See > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment > >> > > > > > > > > > > > > as > >> > > > > > > > > > > > > an example. > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > 105. If we change delete to disable, it's better > >> to > >> > do > >> > > > this > >> > > > > > > > > > > consistently > >> > > > > > > > > > > > in > >> > > > > > > > > > > > > request protocol and admin api as well. > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > 110. The minVersion/maxVersion for features use > >> > int64. > >> > > > > > > Currently, > >> > > > > > > > > our > >> > > > > > > > > > > > > release version schema is major.minor.bugfix > (e.g. > >> > > > 2.5.0). > >> > > > > > It's > >> > > > > > > > > > > possible > >> > > > > > > > > > > > > for new features to be included in minor > releases > >> > too. > >> > > > > Should > >> > > > > > > we > >> > > > > > > > > make > >> > > > > > > > > > > the > >> > > > > > > > > > > > > feature versioning match the release versioning? > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > 111. "During regular operations, the data in the > >> ZK > >> > > node > >> > > > > can > >> > > > > > be > >> > > > > > > > > > mutated > >> > > > > > > > > > > > > only via a specific admin API served only by the > >> > > > > > controller." I > >> > > > > > > > am > >> > > > > > > > > > > > > wondering why can't the controller auto > finalize a > >> > > > feature > >> > > > > > > > version > >> > > > > > > > > > > after > >> > > > > > > > > > > > > all brokers are upgraded? For new users who > >> download > >> > > the > >> > > > > > latest > >> > > > > > > > > > version > >> > > > > > > > > > > > to > >> > > > > > > > > > > > > build a new cluster, it's inconvenient for them > to > >> > have > >> > > > to > >> > > > > > > > manually > >> > > > > > > > > > > > enable > >> > > > > > > > > > > > > each feature. > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > 112. DeleteFeaturesResponse: It seems the apiKey > >> > should > >> > > > be > >> > > > > 49 > >> > > > > > > > > instead > >> > > > > > > > > > > of > >> > > > > > > > > > > > > 48. > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > Jun > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam > < > >> > > > > > > > > > > kpraka...@confluent.io> > >> > > > > > > > > > > > > wrote: > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > Hey Jun, > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > Thanks a lot for the great feedback! Please > note > >> > that > >> > > > the > >> > > > > > > > design > >> > > > > > > > > > > > > > has changed a little bit on the KIP, and we > now > >> > > > propagate > >> > > > > > the > >> > > > > > > > > > > finalized > >> > > > > > > > > > > > > > features metadata only via ZK watches (instead > >> of > >> > > > > > > > > > > UpdateMetadataRequest > >> > > > > > > > > > > > > > from the controller). > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > Please find below my response to your > >> > > > questions/feedback, > >> > > > > > > with > >> > > > > > > > > the > >> > > > > > > > > > > > prefix > >> > > > > > > > > > > > > > "(Kowshik):". > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 100. > >> UpdateFeaturesRequest/UpdateFeaturesResponse > >> > > > > > > > > > > > > > > 100.1 Since this request waits for responses > >> from > >> > > > > > brokers, > >> > > > > > > > > should > >> > > > > > > > > > > we > >> > > > > > > > > > > > > add > >> > > > > > > > > > > > > > a > >> > > > > > > > > > > > > > > timeout in the request (like > >> createTopicRequest)? > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > (Kowshik): Great point! Done. I have added a > >> > timeout > >> > > > > field. > >> > > > > > > > Note: > >> > > > > > > > > > we > >> > > > > > > > > > > no > >> > > > > > > > > > > > > > longer > >> > > > > > > > > > > > > > wait for responses from brokers, since the > >> design > >> > has > >> > > > > been > >> > > > > > > > > changed > >> > > > > > > > > > so > >> > > > > > > > > > > > > that > >> > > > > > > > > > > > > > the > >> > > > > > > > > > > > > > features information is propagated via ZK. > >> > > > Nevertheless, > >> > > > > it > >> > > > > > > is > >> > > > > > > > > > right > >> > > > > > > > > > > to > >> > > > > > > > > > > > > > have a timeout > >> > > > > > > > > > > > > > for the request. > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 100.2 The response schema is a bit weird. > >> > > Typically, > >> > > > > the > >> > > > > > > > > response > >> > > > > > > > > > > > just > >> > > > > > > > > > > > > > > shows an error code and an error message, > >> instead > >> > > of > >> > > > > > > echoing > >> > > > > > > > > the > >> > > > > > > > > > > > > request. > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > (Kowshik): Great point! Yeah, I have modified > >> it to > >> > > > just > >> > > > > > > return > >> > > > > > > > > an > >> > > > > > > > > > > > error > >> > > > > > > > > > > > > > code and a message. > >> > > > > > > > > > > > > > Previously it was not echoing the "request", > >> rather > >> > > it > >> > > > > was > >> > > > > > > > > > returning > >> > > > > > > > > > > > the > >> > > > > > > > > > > > > > latest set of > >> > > > > > > > > > > > > > cluster-wide finalized features (after > applying > >> the > >> > > > > > updates). > >> > > > > > > > But > >> > > > > > > > > > you > >> > > > > > > > > > > > are > >> > > > > > > > > > > > > > right, > >> > > > > > > > > > > > > > the additional info is not required, so I have > >> > > removed > >> > > > it > >> > > > > > > from > >> > > > > > > > > the > >> > > > > > > > > > > > > response > >> > > > > > > > > > > > > > schema. > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 100.3 Should we add a separate request to > >> > > > list/describe > >> > > > > > the > >> > > > > > > > > > > existing > >> > > > > > > > > > > > > > > features? > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > (Kowshik): This is already present in the KIP > >> via > >> > the > >> > > > > > > > > > > > 'DescribeFeatures' > >> > > > > > > > > > > > > > Admin API, > >> > > > > > > > > > > > > > which, underneath covers uses the > >> > ApiVersionsRequest > >> > > to > >> > > > > > > > > > list/describe > >> > > > > > > > > > > > the > >> > > > > > > > > > > > > > existing features. Please read the 'Tooling > >> > support' > >> > > > > > section. > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE > >> in a > >> > > > > single > >> > > > > > > > > request. > >> > > > > > > > > > > For > >> > > > > > > > > > > > > > > DELETE, the version field doesn't make > sense. > >> > So, I > >> > > > > guess > >> > > > > > > the > >> > > > > > > > > > > broker > >> > > > > > > > > > > > > just > >> > > > > > > > > > > > > > > ignores this? An alternative way is to have > a > >> > > > separate > >> > > > > > > > > > > > > > DeleteFeaturesRequest > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > (Kowshik): Great point! I have modified the > KIP > >> now > >> > > to > >> > > > > > have 2 > >> > > > > > > > > > > separate > >> > > > > > > > > > > > > > controller APIs > >> > > > > > > > > > > > > > serving these different purposes: > >> > > > > > > > > > > > > > 1. updateFeatures > >> > > > > > > > > > > > > > 2. deleteFeatures > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 100.5 In UpdateFeaturesResponse, we have > "The > >> > > > > > monotonically > >> > > > > > > > > > > > increasing > >> > > > > > > > > > > > > > > version of the metadata for finalized > >> features." > >> > I > >> > > am > >> > > > > > > > wondering > >> > > > > > > > > > why > >> > > > > > > > > > > > the > >> > > > > > > > > > > > > > > ordering is important? > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > (Kowshik): In the latest KIP write-up, it is > >> called > >> > > > epoch > >> > > > > > > > > (instead > >> > > > > > > > > > of > >> > > > > > > > > > > > > > version), and > >> > > > > > > > > > > > > > it is just the ZK node version. Basically, > this > >> is > >> > > the > >> > > > > > epoch > >> > > > > > > > for > >> > > > > > > > > > the > >> > > > > > > > > > > > > > cluster-wide > >> > > > > > > > > > > > > > finalized feature version metadata. This > >> metadata > >> > is > >> > > > > served > >> > > > > > > to > >> > > > > > > > > > > clients > >> > > > > > > > > > > > > via > >> > > > > > > > > > > > > > the > >> > > > > > > > > > > > > > ApiVersionsResponse (for reads). We propagate > >> > updates > >> > > > > from > >> > > > > > > the > >> > > > > > > > > > > > > '/features' > >> > > > > > > > > > > > > > ZK node > >> > > > > > > > > > > > > > to all brokers, via ZK watches setup by each > >> broker > >> > > on > >> > > > > the > >> > > > > > > > > > > '/features' > >> > > > > > > > > > > > > > node. > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > Now here is why the ordering is important: > >> > > > > > > > > > > > > > ZK watches don't propagate at the same time. > As > >> a > >> > > > result, > >> > > > > > the > >> > > > > > > > > > > > > > ApiVersionsResponse > >> > > > > > > > > > > > > > is eventually consistent across brokers. This > >> can > >> > > > > introduce > >> > > > > > > > cases > >> > > > > > > > > > > > > > where clients see an older lower epoch of the > >> > > features > >> > > > > > > > metadata, > >> > > > > > > > > > > after > >> > > > > > > > > > > > a > >> > > > > > > > > > > > > > more recent > >> > > > > > > > > > > > > > higher epoch was returned at a previous point > in > >> > > time. > >> > > > We > >> > > > > > > > expect > >> > > > > > > > > > > > clients > >> > > > > > > > > > > > > > to always employ the rule that the latest > >> received > >> > > > higher > >> > > > > > > epoch > >> > > > > > > > > of > >> > > > > > > > > > > > > metadata > >> > > > > > > > > > > > > > always trumps an older smaller epoch. Those > >> clients > >> > > > that > >> > > > > > are > >> > > > > > > > > > external > >> > > > > > > > > > > > to > >> > > > > > > > > > > > > > Kafka should strongly consider discovering the > >> > latest > >> > > > > > > metadata > >> > > > > > > > > once > >> > > > > > > > > > > > > during > >> > > > > > > > > > > > > > startup from the brokers, and if required > >> refresh > >> > the > >> > > > > > > metadata > >> > > > > > > > > > > > > periodically > >> > > > > > > > > > > > > > (to get the latest metadata). > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 100.6 Could you specify the required ACL for > >> this > >> > > new > >> > > > > > > > request? > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > (Kowshik): What is ACL, and how could I find > out > >> > > which > >> > > > > one > >> > > > > > to > >> > > > > > > > > > > specify? > >> > > > > > > > > > > > > > Please could you provide me some pointers? > I'll > >> be > >> > > glad > >> > > > > to > >> > > > > > > > update > >> > > > > > > > > > the > >> > > > > > > > > > > > > > KIP once I know the next steps. > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 101. For the broker registration ZK node, > >> should > >> > we > >> > > > > bump > >> > > > > > up > >> > > > > > > > the > >> > > > > > > > > > > > version > >> > > > > > > > > > > > > > in > >> > > > > > > > > > > > > > the json? > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > (Kowshik): Great point! Done. I've increased > the > >> > > > version > >> > > > > in > >> > > > > > > the > >> > > > > > > > > > > broker > >> > > > > > > > > > > > > json > >> > > > > > > > > > > > > > by 1. > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 102. For the /features ZK node, not sure if > we > >> > need > >> > > > the > >> > > > > > > epoch > >> > > > > > > > > > > field. > >> > > > > > > > > > > > > Each > >> > > > > > > > > > > > > > > ZK node has an internal version field that > is > >> > > > > incremented > >> > > > > > > on > >> > > > > > > > > > every > >> > > > > > > > > > > > > > update. > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > (Kowshik): Great point! Done. I'm using the ZK > >> node > >> > > > > version > >> > > > > > > > now, > >> > > > > > > > > > > > instead > >> > > > > > > > > > > > > of > >> > > > > > > > > > > > > > explicitly > >> > > > > > > > > > > > > > incremented epoch. > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 103. "Enabling the actual semantics of a > >> feature > >> > > > > version > >> > > > > > > > > > > cluster-wide > >> > > > > > > > > > > > > is > >> > > > > > > > > > > > > > > left to the discretion of the logic > >> implementing > >> > > the > >> > > > > > > feature > >> > > > > > > > > (ex: > >> > > > > > > > > > > can > >> > > > > > > > > > > > > be > >> > > > > > > > > > > > > > > done via dynamic broker config)." Does that > >> mean > >> > > the > >> > > > > > broker > >> > > > > > > > > > > > > registration > >> > > > > > > > > > > > > > ZK > >> > > > > > > > > > > > > > > node will be updated dynamically when this > >> > happens? > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > (Kowshik): Not really. The text was just > >> conveying > >> > > > that a > >> > > > > > > > broker > >> > > > > > > > > > > could > >> > > > > > > > > > > > > > "know" of > >> > > > > > > > > > > > > > a new feature version, but it does not mean > the > >> > > broker > >> > > > > > should > >> > > > > > > > > have > >> > > > > > > > > > > also > >> > > > > > > > > > > > > > activated the effects of the feature version. > >> > Knowing > >> > > > vs > >> > > > > > > > > activation > >> > > > > > > > > > > > are 2 > >> > > > > > > > > > > > > > separate things, > >> > > > > > > > > > > > > > and the latter can be achieved by dynamic > >> config. I > >> > > > have > >> > > > > > > > reworded > >> > > > > > > > > > the > >> > > > > > > > > > > > > text > >> > > > > > > > > > > > > > to > >> > > > > > > > > > > > > > make this clear to the reader. > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 104. UpdateMetadataRequest > >> > > > > > > > > > > > > > > 104.1 It would be useful to describe when > the > >> > > feature > >> > > > > > > > metadata > >> > > > > > > > > is > >> > > > > > > > > > > > > > included > >> > > > > > > > > > > > > > > in the request. My understanding is that > it's > >> > only > >> > > > > > included > >> > > > > > > > if > >> > > > > > > > > > (1) > >> > > > > > > > > > > > > there > >> > > > > > > > > > > > > > is > >> > > > > > > > > > > > > > > a change to the finalized feature; (2) > broker > >> > > > restart; > >> > > > > > (3) > >> > > > > > > > > > > controller > >> > > > > > > > > > > > > > > failover. > >> > > > > > > > > > > > > > > 104.2 The new fields have the following > >> versions. > >> > > Why > >> > > > > are > >> > > > > > > the > >> > > > > > > > > > > > versions > >> > > > > > > > > > > > > 3+ > >> > > > > > > > > > > > > > > when the top version is bumped to 6? > >> > > > > > > > > > > > > > > "fields": [ > >> > > > > > > > > > > > > > > {"name": "Name", "type": "string", > >> > > > "versions": > >> > > > > > > > "3+", > >> > > > > > > > > > > > > > > "about": "The name of the > >> feature."}, > >> > > > > > > > > > > > > > > {"name": "Version", "type": > "int64", > >> > > > > > "versions": > >> > > > > > > > > "3+", > >> > > > > > > > > > > > > > > "about": "The finalized version > for > >> the > >> > > > > > > feature."} > >> > > > > > > > > > > > > > > ] > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > (Kowshik): With the new improved design, we > have > >> > > > > completely > >> > > > > > > > > > > eliminated > >> > > > > > > > > > > > > the > >> > > > > > > > > > > > > > need to > >> > > > > > > > > > > > > > use UpdateMetadataRequest. This is because we > >> now > >> > > rely > >> > > > on > >> > > > > > ZK > >> > > > > > > to > >> > > > > > > > > > > deliver > >> > > > > > > > > > > > > the > >> > > > > > > > > > > > > > notifications for changes to the '/features' > ZK > >> > node. > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 105. kafka-features.sh: Instead of using > >> > > > update/delete, > >> > > > > > > > perhaps > >> > > > > > > > > > > it's > >> > > > > > > > > > > > > > better > >> > > > > > > > > > > > > > > to use enable/disable? > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > (Kowshik): For delete, yes, I have changed it > so > >> > that > >> > > > we > >> > > > > > > > instead > >> > > > > > > > > > call > >> > > > > > > > > > > > it > >> > > > > > > > > > > > > > 'disable'. > >> > > > > > > > > > > > > > However for 'update', it can now also refer to > >> > either > >> > > > an > >> > > > > > > > upgrade > >> > > > > > > > > > or a > >> > > > > > > > > > > > > > forced downgrade. > >> > > > > > > > > > > > > > Therefore, I have left it the way it is, just > >> > calling > >> > > > it > >> > > > > as > >> > > > > > > > just > >> > > > > > > > > > > > > 'update'. > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > Cheers, > >> > > > > > > > > > > > > > Kowshik > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao < > >> > > > > j...@confluent.io> > >> > > > > > > > > wrote: > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Hi, Kowshik, > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Thanks for the KIP. Looks good overall. A > few > >> > > > comments > >> > > > > > > below. > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 100. > >> UpdateFeaturesRequest/UpdateFeaturesResponse > >> > > > > > > > > > > > > > > 100.1 Since this request waits for responses > >> from > >> > > > > > brokers, > >> > > > > > > > > should > >> > > > > > > > > > > we > >> > > > > > > > > > > > > add > >> > > > > > > > > > > > > > a > >> > > > > > > > > > > > > > > timeout in the request (like > >> createTopicRequest)? > >> > > > > > > > > > > > > > > 100.2 The response schema is a bit weird. > >> > > Typically, > >> > > > > the > >> > > > > > > > > response > >> > > > > > > > > > > > just > >> > > > > > > > > > > > > > > shows an error code and an error message, > >> instead > >> > > of > >> > > > > > > echoing > >> > > > > > > > > the > >> > > > > > > > > > > > > request. > >> > > > > > > > > > > > > > > 100.3 Should we add a separate request to > >> > > > list/describe > >> > > > > > the > >> > > > > > > > > > > existing > >> > > > > > > > > > > > > > > features? > >> > > > > > > > > > > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE > >> in a > >> > > > > single > >> > > > > > > > > request. > >> > > > > > > > > > > For > >> > > > > > > > > > > > > > > DELETE, the version field doesn't make > sense. > >> > So, I > >> > > > > guess > >> > > > > > > the > >> > > > > > > > > > > broker > >> > > > > > > > > > > > > just > >> > > > > > > > > > > > > > > ignores this? An alternative way is to have > a > >> > > > separate > >> > > > > > > > > > > > > > > DeleteFeaturesRequest > >> > > > > > > > > > > > > > > 100.5 In UpdateFeaturesResponse, we have > "The > >> > > > > > monotonically > >> > > > > > > > > > > > increasing > >> > > > > > > > > > > > > > > version of the metadata for finalized > >> features." > >> > I > >> > > am > >> > > > > > > > wondering > >> > > > > > > > > > why > >> > > > > > > > > > > > the > >> > > > > > > > > > > > > > > ordering is important? > >> > > > > > > > > > > > > > > 100.6 Could you specify the required ACL for > >> this > >> > > new > >> > > > > > > > request? > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 101. For the broker registration ZK node, > >> should > >> > we > >> > > > > bump > >> > > > > > up > >> > > > > > > > the > >> > > > > > > > > > > > version > >> > > > > > > > > > > > > > in > >> > > > > > > > > > > > > > > the json? > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 102. For the /features ZK node, not sure if > we > >> > need > >> > > > the > >> > > > > > > epoch > >> > > > > > > > > > > field. > >> > > > > > > > > > > > > Each > >> > > > > > > > > > > > > > > ZK node has an internal version field that > is > >> > > > > incremented > >> > > > > > > on > >> > > > > > > > > > every > >> > > > > > > > > > > > > > update. > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 103. "Enabling the actual semantics of a > >> feature > >> > > > > version > >> > > > > > > > > > > cluster-wide > >> > > > > > > > > > > > > is > >> > > > > > > > > > > > > > > left to the discretion of the logic > >> implementing > >> > > the > >> > > > > > > feature > >> > > > > > > > > (ex: > >> > > > > > > > > > > can > >> > > > > > > > > > > > > be > >> > > > > > > > > > > > > > > done via dynamic broker config)." Does that > >> mean > >> > > the > >> > > > > > broker > >> > > > > > > > > > > > > registration > >> > > > > > > > > > > > > > ZK > >> > > > > > > > > > > > > > > node will be updated dynamically when this > >> > happens? > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 104. UpdateMetadataRequest > >> > > > > > > > > > > > > > > 104.1 It would be useful to describe when > the > >> > > feature > >> > > > > > > > metadata > >> > > > > > > > > is > >> > > > > > > > > > > > > > included > >> > > > > > > > > > > > > > > in the request. My understanding is that > it's > >> > only > >> > > > > > included > >> > > > > > > > if > >> > > > > > > > > > (1) > >> > > > > > > > > > > > > there > >> > > > > > > > > > > > > > is > >> > > > > > > > > > > > > > > a change to the finalized feature; (2) > broker > >> > > > restart; > >> > > > > > (3) > >> > > > > > > > > > > controller > >> > > > > > > > > > > > > > > failover. > >> > > > > > > > > > > > > > > 104.2 The new fields have the following > >> versions. > >> > > Why > >> > > > > are > >> > > > > > > the > >> > > > > > > > > > > > versions > >> > > > > > > > > > > > > 3+ > >> > > > > > > > > > > > > > > when the top version is bumped to 6? > >> > > > > > > > > > > > > > > "fields": [ > >> > > > > > > > > > > > > > > {"name": "Name", "type": "string", > >> > > > "versions": > >> > > > > > > > "3+", > >> > > > > > > > > > > > > > > "about": "The name of the > >> feature."}, > >> > > > > > > > > > > > > > > {"name": "Version", "type": > "int64", > >> > > > > > "versions": > >> > > > > > > > > "3+", > >> > > > > > > > > > > > > > > "about": "The finalized version > for > >> the > >> > > > > > > feature."} > >> > > > > > > > > > > > > > > ] > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 105. kafka-features.sh: Instead of using > >> > > > update/delete, > >> > > > > > > > perhaps > >> > > > > > > > > > > it's > >> > > > > > > > > > > > > > better > >> > > > > > > > > > > > > > > to use enable/disable? > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Jun > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik > >> Prakasam > >> > < > >> > > > > > > > > > > > > kpraka...@confluent.io > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > wrote: > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > Hey Boyang, > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > Thanks for the great feedback! I have > >> updated > >> > the > >> > > > KIP > >> > > > > > > based > >> > > > > > > > > on > >> > > > > > > > > > > your > >> > > > > > > > > > > > > > > > feedback. > >> > > > > > > > > > > > > > > > Please find my response below for your > >> > comments, > >> > > > look > >> > > > > > for > >> > > > > > > > > > > sentences > >> > > > > > > > > > > > > > > > starting > >> > > > > > > > > > > > > > > > with "(Kowshik)" below. > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > 1. "When is it safe for the brokers to > >> begin > >> > > > > handling > >> > > > > > > EOS > >> > > > > > > > > > > > traffic" > >> > > > > > > > > > > > > > > could > >> > > > > > > > > > > > > > > > be > >> > > > > > > > > > > > > > > > > converted as "When is it safe for the > >> brokers > >> > > to > >> > > > > > start > >> > > > > > > > > > serving > >> > > > > > > > > > > > new > >> > > > > > > > > > > > > > > > > Exactly-Once(EOS) semantics" since EOS > is > >> not > >> > > > > > explained > >> > > > > > > > > > earlier > >> > > > > > > > > > > > in > >> > > > > > > > > > > > > > the > >> > > > > > > > > > > > > > > > > context. > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > (Kowshik): Great point! Done. > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > 2. In the *Explanation *section, the > >> metadata > >> > > > > version > >> > > > > > > > > number > >> > > > > > > > > > > part > >> > > > > > > > > > > > > > > seems a > >> > > > > > > > > > > > > > > > > bit blurred. Could you point a reference > >> to > >> > > later > >> > > > > > > section > >> > > > > > > > > > that > >> > > > > > > > > > > we > >> > > > > > > > > > > > > > going > >> > > > > > > > > > > > > > > > to > >> > > > > > > > > > > > > > > > > store it in Zookeeper and update it > every > >> > time > >> > > > when > >> > > > > > > there > >> > > > > > > > > is > >> > > > > > > > > > a > >> > > > > > > > > > > > > > feature > >> > > > > > > > > > > > > > > > > change? > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > (Kowshik): Great point! Done. I've added a > >> > > > reference > >> > > > > in > >> > > > > > > the > >> > > > > > > > > > KIP. > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > 3. For the feature downgrade, although > >> it's a > >> > > > > > Non-goal > >> > > > > > > of > >> > > > > > > > > the > >> > > > > > > > > > > > KIP, > >> > > > > > > > > > > > > > for > >> > > > > > > > > > > > > > > > > features such as group coordinator > >> semantics, > >> > > > there > >> > > > > > is > >> > > > > > > no > >> > > > > > > > > > legal > >> > > > > > > > > > > > > > > scenario > >> > > > > > > > > > > > > > > > to > >> > > > > > > > > > > > > > > > > perform a downgrade at all. So having > >> > downgrade > >> > > > > door > >> > > > > > > open > >> > > > > > > > > is > >> > > > > > > > > > > > pretty > >> > > > > > > > > > > > > > > > > error-prone as human faults happen all > the > >> > > time. > >> > > > > I'm > >> > > > > > > > > assuming > >> > > > > > > > > > > as > >> > > > > > > > > > > > > new > >> > > > > > > > > > > > > > > > > features are implemented, it's not very > >> hard > >> > to > >> > > > > add a > >> > > > > > > > flag > >> > > > > > > > > > > during > >> > > > > > > > > > > > > > > feature > >> > > > > > > > > > > > > > > > > creation to indicate whether this > feature > >> is > >> > > > > > > > > "downgradable". > >> > > > > > > > > > > > Could > >> > > > > > > > > > > > > > you > >> > > > > > > > > > > > > > > > > explain a bit more on the extra > >> engineering > >> > > > effort > >> > > > > > for > >> > > > > > > > > > shipping > >> > > > > > > > > > > > > this > >> > > > > > > > > > > > > > > KIP > >> > > > > > > > > > > > > > > > > with downgrade protection in place? > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > (Kowshik): Great point! I'd agree and > >> disagree > >> > > > here. > >> > > > > > > While > >> > > > > > > > I > >> > > > > > > > > > > agree > >> > > > > > > > > > > > > that > >> > > > > > > > > > > > > > > > accidental > >> > > > > > > > > > > > > > > > downgrades can cause problems, I also > think > >> > > > sometimes > >> > > > > > > > > > downgrades > >> > > > > > > > > > > > > should > >> > > > > > > > > > > > > > > > be allowed for emergency reasons (not all > >> > > > downgrades > >> > > > > > > cause > >> > > > > > > > > > > issues). > >> > > > > > > > > > > > > > > > It is just subjective to the feature being > >> > > > > downgraded. > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > To be more strict about feature version > >> > > > downgrades, I > >> > > > > > > have > >> > > > > > > > > > > modified > >> > > > > > > > > > > > > the > >> > > > > > > > > > > > > > > KIP > >> > > > > > > > > > > > > > > > proposing that we mandate a > >> `--force-downgrade` > >> > > > flag > >> > > > > be > >> > > > > > > > used > >> > > > > > > > > in > >> > > > > > > > > > > the > >> > > > > > > > > > > > > > > > UPDATE_FEATURES api > >> > > > > > > > > > > > > > > > and the tooling, whenever the human is > >> > > downgrading > >> > > > a > >> > > > > > > > > finalized > >> > > > > > > > > > > > > feature > >> > > > > > > > > > > > > > > > version. > >> > > > > > > > > > > > > > > > Hopefully this should cover the > requirement, > >> > > until > >> > > > we > >> > > > > > > find > >> > > > > > > > > the > >> > > > > > > > > > > need > >> > > > > > > > > > > > > for > >> > > > > > > > > > > > > > > > advanced downgrade support. > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > 4. "Each broker’s supported dictionary > of > >> > > feature > >> > > > > > > > versions > >> > > > > > > > > > will > >> > > > > > > > > > > > be > >> > > > > > > > > > > > > > > > defined > >> > > > > > > > > > > > > > > > > in the broker code." So this means in > >> order > >> > to > >> > > > > > > restrict a > >> > > > > > > > > > > certain > >> > > > > > > > > > > > > > > > feature, > >> > > > > > > > > > > > > > > > > we need to start the broker first and > then > >> > > send a > >> > > > > > > feature > >> > > > > > > > > > > gating > >> > > > > > > > > > > > > > > request > >> > > > > > > > > > > > > > > > > immediately, which introduces a time gap > >> and > >> > > the > >> > > > > > > > > > > > intended-to-close > >> > > > > > > > > > > > > > > > feature > >> > > > > > > > > > > > > > > > > could actually serve request during this > >> > phase. > >> > > > Do > >> > > > > > you > >> > > > > > > > > think > >> > > > > > > > > > we > >> > > > > > > > > > > > > > should > >> > > > > > > > > > > > > > > > also > >> > > > > > > > > > > > > > > > > support configurations as well so that > >> admin > >> > > user > >> > > > > > could > >> > > > > > > > > > freely > >> > > > > > > > > > > > roll > >> > > > > > > > > > > > > > up > >> > > > > > > > > > > > > > > a > >> > > > > > > > > > > > > > > > > cluster with all nodes complying the > same > >> > > feature > >> > > > > > > gating, > >> > > > > > > > > > > without > >> > > > > > > > > > > > > > > > worrying > >> > > > > > > > > > > > > > > > > about the turnaround time to propagate > the > >> > > > message > >> > > > > > only > >> > > > > > > > > after > >> > > > > > > > > > > the > >> > > > > > > > > > > > > > > cluster > >> > > > > > > > > > > > > > > > > starts up? > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > (Kowshik): This is a great point/question. > >> One > >> > of > >> > > > the > >> > > > > > > > > > > expectations > >> > > > > > > > > > > > > out > >> > > > > > > > > > > > > > of > >> > > > > > > > > > > > > > > > this KIP, which is > >> > > > > > > > > > > > > > > > already followed in the broker, is the > >> > following. > >> > > > > > > > > > > > > > > > - Imagine at time T1 the broker starts up > >> and > >> > > > > > registers > >> > > > > > > > it’s > >> > > > > > > > > > > > > presence > >> > > > > > > > > > > > > > in > >> > > > > > > > > > > > > > > > ZK, > >> > > > > > > > > > > > > > > > along with advertising it’s supported > >> > > features. > >> > > > > > > > > > > > > > > > - Imagine at a future time T2 the broker > >> > > receives > >> > > > > the > >> > > > > > > > > > > > > > > > UpdateMetadataRequest > >> > > > > > > > > > > > > > > > from the controller, which contains the > >> > latest > >> > > > > > > finalized > >> > > > > > > > > > > > features > >> > > > > > > > > > > > > as > >> > > > > > > > > > > > > > > > seen by > >> > > > > > > > > > > > > > > > the controller. The broker validates > this > >> > data > >> > > > > > against > >> > > > > > > > > it’s > >> > > > > > > > > > > > > > supported > >> > > > > > > > > > > > > > > > features to > >> > > > > > > > > > > > > > > > make sure there is no mismatch (it will > >> > > shutdown > >> > > > > if > >> > > > > > > > there > >> > > > > > > > > is > >> > > > > > > > > > > an > >> > > > > > > > > > > > > > > > incompatibility). > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > It is expected that during the time > between > >> > the 2 > >> > > > > > events > >> > > > > > > T1 > >> > > > > > > > > and > >> > > > > > > > > > > T2, > >> > > > > > > > > > > > > the > >> > > > > > > > > > > > > > > > broker is > >> > > > > > > > > > > > > > > > almost a silent entity in the cluster. It > >> does > >> > > not > >> > > > > add > >> > > > > > > any > >> > > > > > > > > > value > >> > > > > > > > > > > to > >> > > > > > > > > > > > > the > >> > > > > > > > > > > > > > > > cluster, or carry > >> > > > > > > > > > > > > > > > out any important broker activities. By > >> > > > “important”, > >> > > > > I > >> > > > > > > mean > >> > > > > > > > > it > >> > > > > > > > > > is > >> > > > > > > > > > > > not > >> > > > > > > > > > > > > > > doing > >> > > > > > > > > > > > > > > > mutations > >> > > > > > > > > > > > > > > > on it’s persistence, not mutating critical > >> > > > in-memory > >> > > > > > > state, > >> > > > > > > > > > won’t > >> > > > > > > > > > > > be > >> > > > > > > > > > > > > > > > serving > >> > > > > > > > > > > > > > > > produce/fetch requests. Note it doesn’t > even > >> > know > >> > > > > it’s > >> > > > > > > > > assigned > >> > > > > > > > > > > > > > > partitions > >> > > > > > > > > > > > > > > > until > >> > > > > > > > > > > > > > > > it receives UpdateMetadataRequest from > >> > > controller. > >> > > > > > > Anything > >> > > > > > > > > the > >> > > > > > > > > > > > > broker > >> > > > > > > > > > > > > > is > >> > > > > > > > > > > > > > > > doing up > >> > > > > > > > > > > > > > > > until this point is not damaging/useful. > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > I’ve clarified the above in the KIP, see > >> this > >> > new > >> > > > > > > section: > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime > >> > > > > > > > > > > > > > > > . > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > 5. "adding a new Feature, updating or > >> > deleting > >> > > an > >> > > > > > > > existing > >> > > > > > > > > > > > > Feature", > >> > > > > > > > > > > > > > > may > >> > > > > > > > > > > > > > > > be > >> > > > > > > > > > > > > > > > > I misunderstood something, I thought the > >> > > features > >> > > > > are > >> > > > > > > > > defined > >> > > > > > > > > > > in > >> > > > > > > > > > > > > > broker > >> > > > > > > > > > > > > > > > > code, so admin could not really create a > >> new > >> > > > > feature? > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > (Kowshik): Great point! You understood > this > >> > > right. > >> > > > > Here > >> > > > > > > > > adding > >> > > > > > > > > > a > >> > > > > > > > > > > > > > feature > >> > > > > > > > > > > > > > > > means we are > >> > > > > > > > > > > > > > > > adding a cluster-wide finalized *max* > >> version > >> > > for a > >> > > > > > > feature > >> > > > > > > > > > that > >> > > > > > > > > > > > was > >> > > > > > > > > > > > > > > > previously never finalized. > >> > > > > > > > > > > > > > > > I have clarified this in the KIP now. > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > 6. I think we need a separate error code > >> like > >> > > > > > > > > > > > > > > FEATURE_UPDATE_IN_PROGRESS > >> > > > > > > > > > > > > > > > to > >> > > > > > > > > > > > > > > > > reject a concurrent feature update > >> request. > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > (Kowshik): Great point! I have modified > the > >> KIP > >> > > > > adding > >> > > > > > > the > >> > > > > > > > > > above > >> > > > > > > > > > > > (see > >> > > > > > > > > > > > > > > > 'Tooling support -> Admin API changes'). > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > 7. I think we haven't discussed the > >> > alternative > >> > > > > > > solution > >> > > > > > > > to > >> > > > > > > > > > > pass > >> > > > > > > > > > > > > the > >> > > > > > > > > > > > > > > > > feature information through Zookeeper. > Is > >> > that > >> > > > > > > mentioned > >> > > > > > > > in > >> > > > > > > > > > the > >> > > > > > > > > > > > KIP > >> > > > > > > > > > > > > > to > >> > > > > > > > > > > > > > > > > justify why using UpdateMetadata is more > >> > > > favorable? > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > (Kowshik): Nice question! The broker reads > >> > > > finalized > >> > > > > > > > feature > >> > > > > > > > > > info > >> > > > > > > > > > > > > > stored > >> > > > > > > > > > > > > > > in > >> > > > > > > > > > > > > > > > ZK, > >> > > > > > > > > > > > > > > > only during startup when it does a > >> validation. > >> > > When > >> > > > > > > serving > >> > > > > > > > > > > > > > > > `ApiVersionsRequest`, the > >> > > > > > > > > > > > > > > > broker does not read this info from ZK > >> > directly. > >> > > > I'd > >> > > > > > > > imagine > >> > > > > > > > > > the > >> > > > > > > > > > > > risk > >> > > > > > > > > > > > > > is > >> > > > > > > > > > > > > > > > that it can increase > >> > > > > > > > > > > > > > > > the ZK read QPS which can be a bottleneck > >> for > >> > the > >> > > > > > system. > >> > > > > > > > > > Today, > >> > > > > > > > > > > in > >> > > > > > > > > > > > > > Kafka > >> > > > > > > > > > > > > > > > we use the > >> > > > > > > > > > > > > > > > controller to fan out ZK updates to > brokers > >> and > >> > > we > >> > > > > want > >> > > > > > > to > >> > > > > > > > > > stick > >> > > > > > > > > > > to > >> > > > > > > > > > > > > > that > >> > > > > > > > > > > > > > > > pattern to avoid > >> > > > > > > > > > > > > > > > the ZK read bottleneck when serving > >> > > > > > `ApiVersionsRequest`. > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > 8. I was under the impression that user > >> could > >> > > > > > > configure a > >> > > > > > > > > > range > >> > > > > > > > > > > > of > >> > > > > > > > > > > > > > > > > supported versions, what's the trade-off > >> for > >> > > > > allowing > >> > > > > > > > > single > >> > > > > > > > > > > > > > finalized > >> > > > > > > > > > > > > > > > > version only? > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > (Kowshik): Great question! The finalized > >> > version > >> > > > of a > >> > > > > > > > feature > >> > > > > > > > > > > > > basically > >> > > > > > > > > > > > > > > > refers to > >> > > > > > > > > > > > > > > > the cluster-wide finalized feature > "maximum" > >> > > > version. > >> > > > > > For > >> > > > > > > > > > > example, > >> > > > > > > > > > > > if > >> > > > > > > > > > > > > > the > >> > > > > > > > > > > > > > > > 'group_coordinator' feature > >> > > > > > > > > > > > > > > > has the finalized version set to 10, then, > >> it > >> > > means > >> > > > > > that > >> > > > > > > > > > > > cluster-wide > >> > > > > > > > > > > > > > all > >> > > > > > > > > > > > > > > > versions upto v10 are > >> > > > > > > > > > > > > > > > supported for this feature. However, note > >> that > >> > if > >> > > > > some > >> > > > > > > > > version > >> > > > > > > > > > > (ex: > >> > > > > > > > > > > > > v0) > >> > > > > > > > > > > > > > > > gets deprecated > >> > > > > > > > > > > > > > > > for this feature, then we don’t convey > that > >> > using > >> > > > > this > >> > > > > > > > scheme > >> > > > > > > > > > > (also > >> > > > > > > > > > > > > > > > supporting deprecation is a non-goal). > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > (Kowshik): I’ve now modified the KIP at > all > >> > > points, > >> > > > > > > > refering > >> > > > > > > > > to > >> > > > > > > > > > > > > > finalized > >> > > > > > > > > > > > > > > > feature "maximum" versions. > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > 9. One minor syntax fix: Note that here > >> the > >> > > > > "client" > >> > > > > > > here > >> > > > > > > > > may > >> > > > > > > > > > > be > >> > > > > > > > > > > > a > >> > > > > > > > > > > > > > > > producer > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > (Kowshik): Great point! Done. > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > Cheers, > >> > > > > > > > > > > > > > > > Kowshik > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang > Chen > >> < > >> > > > > > > > > > > > > > reluctanthero...@gmail.com> > >> > > > > > > > > > > > > > > > wrote: > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > Hey Kowshik, > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > thanks for the revised KIP. Got a couple > >> of > >> > > > > > questions: > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > 1. "When is it safe for the brokers to > >> begin > >> > > > > handling > >> > > > > > > EOS > >> > > > > > > > > > > > traffic" > >> > > > > > > > > > > > > > > could > >> > > > > > > > > > > > > > > > be > >> > > > > > > > > > > > > > > > > converted as "When is it safe for the > >> brokers > >> > > to > >> > > > > > start > >> > > > > > > > > > serving > >> > > > > > > > > > > > new > >> > > > > > > > > > > > > > > > > Exactly-Once(EOS) semantics" since EOS > is > >> not > >> > > > > > explained > >> > > > > > > > > > earlier > >> > > > > > > > > > > > in > >> > > > > > > > > > > > > > the > >> > > > > > > > > > > > > > > > > context. > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > 2. In the *Explanation *section, the > >> metadata > >> > > > > version > >> > > > > > > > > number > >> > > > > > > > > > > part > >> > > > > > > > > > > > > > > seems a > >> > > > > > > > > > > > > > > > > bit blurred. Could you point a reference > >> to > >> > > later > >> > > > > > > section > >> > > > > > > > > > that > >> > > > > > > > > > > we > >> > > > > > > > > > > > > > going > >> > > > > > > > > > > > > > > > to > >> > > > > > > > > > > > > > > > > store it in Zookeeper and update it > every > >> > time > >> > > > when > >> > > > > > > there > >> > > > > > > > > is > >> > > > > > > > > > a > >> > > > > > > > > > > > > > feature > >> > > > > > > > > > > > > > > > > change? > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > 3. For the feature downgrade, although > >> it's a > >> > > > > > Non-goal > >> > > > > > > of > >> > > > > > > > > the > >> > > > > > > > > > > > KIP, > >> > > > > > > > > > > > > > for > >> > > > > > > > > > > > > > > > > features such as group coordinator > >> semantics, > >> > > > there > >> > > > > > is > >> > > > > > > no > >> > > > > > > > > > legal > >> > > > > > > > > > > > > > > scenario > >> > > > > > > > > > > > > > > > to > >> > > > > > > > > > > > > > > > > perform a downgrade at all. So having > >> > downgrade > >> > > > > door > >> > > > > > > open > >> > > > > > > > > is > >> > > > > > > > > > > > pretty > >> > > > > > > > > > > > > > > > > error-prone as human faults happen all > the > >> > > time. > >> > > > > I'm > >> > > > > > > > > assuming > >> > > > > > > > > > > as > >> > > > > > > > > > > > > new > >> > > > > > > > > > > > > > > > > features are implemented, it's not very > >> hard > >> > to > >> > > > > add a > >> > > > > > > > flag > >> > > > > > > > > > > during > >> > > > > > > > > > > > > > > feature > >> > > > > > > > > > > > > > > > > creation to indicate whether this > feature > >> is > >> > > > > > > > > "downgradable". > >> > > > > > > > > > > > Could > >> > > > > > > > > > > > > > you > >> > > > > > > > > > > > > > > > > explain a bit more on the extra > >> engineering > >> > > > effort > >> > > > > > for > >> > > > > > > > > > shipping > >> > > > > > > > > > > > > this > >> > > > > > > > > > > > > > > KIP > >> > > > > > > > > > > > > > > > > with downgrade protection in place? > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > 4. "Each broker’s supported dictionary > of > >> > > feature > >> > > > > > > > versions > >> > > > > > > > > > will > >> > > > > > > > > > > > be > >> > > > > > > > > > > > > > > > defined > >> > > > > > > > > > > > > > > > > in the broker code." So this means in > >> order > >> > to > >> > > > > > > restrict a > >> > > > > > > > > > > certain > >> > > > > > > > > > > > > > > > feature, > >> > > > > > > > > > > > > > > > > we need to start the broker first and > then > >> > > send a > >> > > > > > > feature > >> > > > > > > > > > > gating > >> > > > > > > > > > > > > > > request > >> > > > > > > > > > > > > > > > > immediately, which introduces a time gap > >> and > >> > > the > >> > > > > > > > > > > > intended-to-close > >> > > > > > > > > > > > > > > > feature > >> > > > > > > > > > > > > > > > > could actually serve request during this > >> > phase. > >> > > > Do > >> > > > > > you > >> > > > > > > > > think > >> > > > > > > > > > we > >> > > > > > > > > > > > > > should > >> > > > > > > > > > > > > > > > also > >> > > > > > > > > > > > > > > > > support configurations as well so that > >> admin > >> > > user > >> > > > > > could > >> > > > > > > > > > freely > >> > > > > > > > > > > > roll > >> > > > > > > > > > > > > > up > >> > > > > > > > > > > > > > > a > >> > > > > > > > > > > > > > > > > cluster with all nodes complying the > same > >> > > feature > >> > > > > > > gating, > >> > > > > > > > > > > without > >> > > > > > > > > > > > > > > > worrying > >> > > > > > > > > > > > > > > > > about the turnaround time to propagate > the > >> > > > message > >> > > > > > only > >> > > > > > > > > after > >> > > > > > > > > > > the > >> > > > > > > > > > > > > > > cluster > >> > > > > > > > > > > > > > > > > starts up? > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > 5. "adding a new Feature, updating or > >> > deleting > >> > > an > >> > > > > > > > existing > >> > > > > > > > > > > > > Feature", > >> > > > > > > > > > > > > > > may > >> > > > > > > > > > > > > > > > be > >> > > > > > > > > > > > > > > > > I misunderstood something, I thought the > >> > > features > >> > > > > are > >> > > > > > > > > defined > >> > > > > > > > > > > in > >> > > > > > > > > > > > > > broker > >> > > > > > > > > > > > > > > > > code, so admin could not really create a > >> new > >> > > > > feature? > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > 6. I think we need a separate error code > >> like > >> > > > > > > > > > > > > > > FEATURE_UPDATE_IN_PROGRESS > >> > > > > > > > > > > > > > > > to > >> > > > > > > > > > > > > > > > > reject a concurrent feature update > >> request. > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > 7. I think we haven't discussed the > >> > alternative > >> > > > > > > solution > >> > > > > > > > to > >> > > > > > > > > > > pass > >> > > > > > > > > > > > > the > >> > > > > > > > > > > > > > > > > feature information through Zookeeper. > Is > >> > that > >> > > > > > > mentioned > >> > > > > > > > in > >> > > > > > > > > > the > >> > > > > > > > > > > > KIP > >> > > > > > > > > > > > > > to > >> > > > > > > > > > > > > > > > > justify why using UpdateMetadata is more > >> > > > favorable? > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > 8. I was under the impression that user > >> could > >> > > > > > > configure a > >> > > > > > > > > > range > >> > > > > > > > > > > > of > >> > > > > > > > > > > > > > > > > supported versions, what's the trade-off > >> for > >> > > > > allowing > >> > > > > > > > > single > >> > > > > > > > > > > > > > finalized > >> > > > > > > > > > > > > > > > > version only? > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > 9. One minor syntax fix: Note that here > >> the > >> > > > > "client" > >> > > > > > > here > >> > > > > > > > > may > >> > > > > > > > > > > be > >> > > > > > > > > > > > a > >> > > > > > > > > > > > > > > > producer > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > Boyang > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin > >> McCabe > >> > < > >> > > > > > > > > > > cmcc...@apache.org > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > > wrote: > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > On Thu, Mar 26, 2020, at 19:24, > Kowshik > >> > > > Prakasam > >> > > > > > > wrote: > >> > > > > > > > > > > > > > > > > > > Hi Colin, > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > Thanks for the feedback! I've > changed > >> the > >> > > KIP > >> > > > > to > >> > > > > > > > > address > >> > > > > > > > > > > your > >> > > > > > > > > > > > > > > > > > > suggestions. > >> > > > > > > > > > > > > > > > > > > Please find below my explanation. > Here > >> > is a > >> > > > > link > >> > > > > > to > >> > > > > > > > KIP > >> > > > > > > > > > > 584: > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features > >> > > > > > > > > > > > > > > > > > > . > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > 1. '__data_version__' is the version > >> of > >> > the > >> > > > > > > finalized > >> > > > > > > > > > > feature > >> > > > > > > > > > > > > > > > metadata > >> > > > > > > > > > > > > > > > > > > (i.e. actual ZK node contents), > while > >> the > >> > > > > > > > > > > > '__schema_version__' > >> > > > > > > > > > > > > is > >> > > > > > > > > > > > > > > the > >> > > > > > > > > > > > > > > > > > > version of the schema of the data > >> > persisted > >> > > > in > >> > > > > > ZK. > >> > > > > > > > > These > >> > > > > > > > > > > > serve > >> > > > > > > > > > > > > > > > > different > >> > > > > > > > > > > > > > > > > > > purposes. '__data_version__' is is > >> useful > >> > > > > mainly > >> > > > > > to > >> > > > > > > > > > clients > >> > > > > > > > > > > > > > during > >> > > > > > > > > > > > > > > > > reads, > >> > > > > > > > > > > > > > > > > > > to differentiate between the 2 > >> versions > >> > of > >> > > > > > > eventually > >> > > > > > > > > > > > > consistent > >> > > > > > > > > > > > > > > > > > 'finalized > >> > > > > > > > > > > > > > > > > > > features' metadata (i.e. larger > >> metadata > >> > > > > version > >> > > > > > is > >> > > > > > > > > more > >> > > > > > > > > > > > > recent). > >> > > > > > > > > > > > > > > > > > > '__schema_version__' provides an > >> > additional > >> > > > > > degree > >> > > > > > > of > >> > > > > > > > > > > > > > flexibility, > >> > > > > > > > > > > > > > > > > where > >> > > > > > > > > > > > > > > > > > if > >> > > > > > > > > > > > > > > > > > > we decide to change the schema for > >> > > > '/features' > >> > > > > > node > >> > > > > > > > in > >> > > > > > > > > ZK > >> > > > > > > > > > > (in > >> > > > > > > > > > > > > the > >> > > > > > > > > > > > > > > > > > future), > >> > > > > > > > > > > > > > > > > > > then we can manage broker roll outs > >> > > suitably > >> > > > > > (i.e. > >> > > > > > > > > > > > > > > > > > > serialization/deserialization of the > >> ZK > >> > > data > >> > > > > can > >> > > > > > be > >> > > > > > > > > > handled > >> > > > > > > > > > > > > > > safely). > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > Hi Kowshik, > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > If you're talking about a number that > >> lets > >> > > you > >> > > > > know > >> > > > > > > if > >> > > > > > > > > data > >> > > > > > > > > > > is > >> > > > > > > > > > > > > more > >> > > > > > > > > > > > > > > or > >> > > > > > > > > > > > > > > > > > less recent, we would typically call > >> that > >> > an > >> > > > > epoch, > >> > > > > > > and > >> > > > > > > > > > not a > >> > > > > > > > > > > > > > > version. > >> > > > > > > > > > > > > > > > > For > >> > > > > > > > > > > > > > > > > > the ZK data structures, the word > >> "version" > >> > is > >> > > > > > > typically > >> > > > > > > > > > > > reserved > >> > > > > > > > > > > > > > for > >> > > > > > > > > > > > > > > > > > describing changes to the overall > >> schema of > >> > > the > >> > > > > > data > >> > > > > > > > that > >> > > > > > > > > > is > >> > > > > > > > > > > > > > written > >> > > > > > > > > > > > > > > to > >> > > > > > > > > > > > > > > > > > ZooKeeper. We don't even really > change > >> the > >> > > > > > "version" > >> > > > > > > > of > >> > > > > > > > > > > those > >> > > > > > > > > > > > > > > schemas > >> > > > > > > > > > > > > > > > > that > >> > > > > > > > > > > > > > > > > > much, since most changes are > >> > > > > backwards-compatible. > >> > > > > > > But > >> > > > > > > > > we > >> > > > > > > > > > do > >> > > > > > > > > > > > > > include > >> > > > > > > > > > > > > > > > > that > >> > > > > > > > > > > > > > > > > > version field just in case. > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > I don't think we really need an epoch > >> here, > >> > > > > though, > >> > > > > > > > since > >> > > > > > > > > > we > >> > > > > > > > > > > > can > >> > > > > > > > > > > > > > just > >> > > > > > > > > > > > > > > > > look > >> > > > > > > > > > > > > > > > > > at the broker epoch. Whenever the > >> broker > >> > > > > > registers, > >> > > > > > > > its > >> > > > > > > > > > > epoch > >> > > > > > > > > > > > > will > >> > > > > > > > > > > > > > > be > >> > > > > > > > > > > > > > > > > > greater than the previous broker > epoch. > >> > And > >> > > > the > >> > > > > > > newly > >> > > > > > > > > > > > registered > >> > > > > > > > > > > > > > > data > >> > > > > > > > > > > > > > > > > will > >> > > > > > > > > > > > > > > > > > take priority. This will be a lot > >> simpler > >> > > than > >> > > > > > > adding > >> > > > > > > > a > >> > > > > > > > > > > > separate > >> > > > > > > > > > > > > > > epoch > >> > > > > > > > > > > > > > > > > > system, I think. > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > 2. Regarding admin client needing > min > >> and > >> > > max > >> > > > > > > > > > information - > >> > > > > > > > > > > > you > >> > > > > > > > > > > > > > are > >> > > > > > > > > > > > > > > > > > right! > >> > > > > > > > > > > > > > > > > > > I've changed the KIP such that the > >> Admin > >> > > API > >> > > > > also > >> > > > > > > > > allows > >> > > > > > > > > > > the > >> > > > > > > > > > > > > user > >> > > > > > > > > > > > > > > to > >> > > > > > > > > > > > > > > > > read > >> > > > > > > > > > > > > > > > > > > 'supported features' from a specific > >> > > broker. > >> > > > > > Please > >> > > > > > > > > look > >> > > > > > > > > > at > >> > > > > > > > > > > > the > >> > > > > > > > > > > > > > > > section > >> > > > > > > > > > > > > > > > > > > "Admin API changes". > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > Thanks. > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > 3. Regarding the use of `long` vs > >> `Long` > >> > - > >> > > it > >> > > > > was > >> > > > > > > not > >> > > > > > > > > > > > > deliberate. > >> > > > > > > > > > > > > > > > I've > >> > > > > > > > > > > > > > > > > > > improved the KIP to just use `long` > at > >> > all > >> > > > > > places. > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > Sounds good. > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > 4. Regarding > >> kafka.admin.FeatureCommand > >> > > tool > >> > > > - > >> > > > > > you > >> > > > > > > > are > >> > > > > > > > > > > right! > >> > > > > > > > > > > > > > I've > >> > > > > > > > > > > > > > > > > > updated > >> > > > > > > > > > > > > > > > > > > the KIP sketching the functionality > >> > > provided > >> > > > by > >> > > > > > > this > >> > > > > > > > > > tool, > >> > > > > > > > > > > > with > >> > > > > > > > > > > > > > > some > >> > > > > > > > > > > > > > > > > > > examples. Please look at the section > >> > > "Tooling > >> > > > > > > support > >> > > > > > > > > > > > > examples". > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > Thank you! > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > Thanks, Kowshik. > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > cheers, > >> > > > > > > > > > > > > > > > > > Colin > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > Cheers, > >> > > > > > > > > > > > > > > > > > > Kowshik > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > On Wed, Mar 25, 2020 at 11:31 PM > Colin > >> > > > McCabe < > >> > > > > > > > > > > > > > cmcc...@apache.org> > >> > > > > > > > > > > > > > > > > > wrote: > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > Thanks, Kowshik, this looks good. > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > In the "Schema" section, do we > >> really > >> > > need > >> > > > > both > >> > > > > > > > > > > > > > > __schema_version__ > >> > > > > > > > > > > > > > > > > and > >> > > > > > > > > > > > > > > > > > > > __data_version__? Can we just > have > >> a > >> > > > single > >> > > > > > > > version > >> > > > > > > > > > > field > >> > > > > > > > > > > > > > here? > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > Shouldn't the Admin(Client) > function > >> > have > >> > > > > some > >> > > > > > > way > >> > > > > > > > to > >> > > > > > > > > > get > >> > > > > > > > > > > > the > >> > > > > > > > > > > > > > min > >> > > > > > > > > > > > > > > > and > >> > > > > > > > > > > > > > > > > > max > >> > > > > > > > > > > > > > > > > > > > information that we're exposing as > >> > > well? I > >> > > > > > guess > >> > > > > > > > we > >> > > > > > > > > > > could > >> > > > > > > > > > > > > have > >> > > > > > > > > > > > > > > > min, > >> > > > > > > > > > > > > > > > > > max, > >> > > > > > > > > > > > > > > > > > > > and current. Unrelated: is the > use > >> of > >> > > Long > >> > > > > > > rather > >> > > > > > > > > than > >> > > > > > > > > > > > long > >> > > > > > > > > > > > > > > > > deliberate > >> > > > > > > > > > > > > > > > > > > > here? > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > It would be good to describe how > the > >> > > > command > >> > > > > > line > >> > > > > > > > > tool > >> > > > > > > > > > > > > > > > > > > > kafka.admin.FeatureCommand will > >> work. > >> > > For > >> > > > > > > example > >> > > > > > > > > the > >> > > > > > > > > > > > flags > >> > > > > > > > > > > > > > that > >> > > > > > > > > > > > > > > > it > >> > > > > > > > > > > > > > > > > > will > >> > > > > > > > > > > > > > > > > > > > take and the output that it will > >> > generate > >> > > > to > >> > > > > > > > STDOUT. > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > cheers, > >> > > > > > > > > > > > > > > > > > > > Colin > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > On Tue, Mar 24, 2020, at 17:08, > >> Kowshik > >> > > > > > Prakasam > >> > > > > > > > > wrote: > >> > > > > > > > > > > > > > > > > > > > > Hi all, > >> > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > I've opened KIP-584 > >> > > > > > > > > > > > > > > < > >> https://issues.apache.org/jira/browse/KIP-584> > >> > < > >> > > > > > > > > > > > > > > > > >> https://issues.apache.org/jira/browse/KIP-584 > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > which > >> > > > > > > > > > > > > > > > > > > > > is intended to provide a > >> versioning > >> > > > scheme > >> > > > > > for > >> > > > > > > > > > > features. > >> > > > > > > > > > > > > I'd > >> > > > > > > > > > > > > > > like > >> > > > > > > > > > > > > > > > > to > >> > > > > > > > > > > > > > > > > > use > >> > > > > > > > > > > > > > > > > > > > > this thread to discuss the same. > >> I'd > >> > > > > > appreciate > >> > > > > > > > any > >> > > > > > > > > > > > > feedback > >> > > > > > > > > > > > > > on > >> > > > > > > > > > > > > > > > > this. > >> > > > > > > > > > > > > > > > > > > > > Here > >> > > > > > > > > > > > > > > > > > > > > is a link to KIP-584 > >> > > > > > > > > > > > > > > < > >> https://issues.apache.org/jira/browse/KIP-584>: > >> > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features > >> > > > > > > > > > > > > > > > > > > > > . > >> > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > Thank you! > >> > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > Cheers, > >> > > > > > > > > > > > > > > > > > > > > Kowshik > >> > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > >