Hi Colin,

Thanks for the feedback! I have updated the KIP based on your feedback.
Please find my response below.

> The discussion on ZooKeeper reads versus writes makes sense to me.  The
important thing to keep in mind here is that in the bridge release,
> all brokers can read from ZooKeeper, but only the controller writes.

(Kowshik): Great, thanks!

> Why do we need both UpdateFeaturesRequest and DeleteFeaturesRequest?  It
seems awkward to have "deleting" be a special case here when the
> general idea is that we have an RPC to change the supported feature
flags.  Changing the feature level from 2 to 1 doesn't seem that different
> from changing it from 1 to not present.

(Kowshik): Done, makes sense. I have updated the KIP to just use 1 API, and
that's the UpdateFeaturesRequest. For the deletion case, we can just ignore
the version number passed in the API (which is indicative of 'not present').

> It would be simpler to just say that a feature flag which doesn't appear
in the znode is considered to be at version level 0.  This will also
> simplify the code a lot, I think, since you won't have to keep track of
tricky distinctions between "disabled" and "enabled at version 0."
> Then you would be able to just use an int in most places.

(Kowshik): I'm not sure I understood why we want do it this way. If an
entry for some finalized feature is absent in '/features' node,
alternatively we can just treat this as a feature with a version that
was never finalized/enabled or it was deleted at some point. Then, we can
even allow for "enabled at version 0" as the {minVersion, maxVersion} range
can be any valid range, not necessarily minVersion > 0.

> (By the way, I would propose the term "version level" for this number,
since it avoids confusion with all the other meanings of the word
> "version" that we have in the code.)

(Kowshik): Good idea! I have updated the KIP to refer to "version level"
instead of version.

> Another thing to keep in mind is that if a request RPC is batch, the
corresponding response RPC also needs to be batch.  In other words, you
> need multiple error codes, one for each feature flag whose level you are
trying to change.  Unless the idea is that the whole change is a
> transaction that all either happens or doesn't?

(Kowshik): Yes, the whole change is a transaction. Either all provided
FeatureUpdate is carried out in ZK, or none happens. That's why we just
allow for a single error code field, as it is easier that way. This
transactional guarantee is mentioned under 'Proposed changes > New
controller API'

> Rather than FeatureUpdateType, I would just go with a boolean like
"force."  I'm not sure what other values we'd want to add to this later on,
> if it were an enum.  I think the boolean is clearer.

(Kowshik): Since we have decided to go just one API (i.e.
UpdateFeaturesRequest), it is better that FeatureUpdateType is an enum with
multiple values. A FeatureUpdateType is tied to a feature, and the possible
values are: ADD_OR_UPDATE, ADD_OR_UPDATE_ALLOW_DOWNGRADE, DELETE.

> This ties in with my comment earlier, but for the result classes, we need
methods other than just "all".  Batch operations aren't usable if
> you can't get the result per operation.... unless the semantics are
transactional and it really is just everything succeeded or everything
> failed.

(Kowshik): The semantics are transactional, as I explained above.

> There are a bunch of Java interfaces described like FinalizedFeature,
FeatureUpdate, UpdateFeaturesResult, and so on that should just be
> regular concrete Java classes.  In general we'd only use an interface if
we wanted the caller to implement some kind of callback function. We
> don't make classes that are just designed to hold data into interfaces,
since that just imposes extra work on callers (they have to define
> their own concrete class for each interface just to use the API.)
 There's also probably no reason to have these classes inherit from each
> other or have complex type relationships.  One more nitpick is that Kafka
generally doesn't use "get" in the function names of accessors.

(Kowshik): Done, I have changed the KIP. By 'interface', I just meant
interface from a pseudocode standpoint (i.e. it was just an abstraction
providing at least the specified behavior). Since that was a bit confusing,
I have now renamed it calling it a class. Also I have eliminated the type
relationships.


Cheers,
Kowshik

On Fri, Apr 3, 2020 at 5:54 PM Kowshik Prakasam <kpraka...@confluent.io>
wrote:

> Hi Jun,
>
> Thanks for the feedback and suggestions. Please find my response below.
>
> > 100.6 For every new request, the admin needs to control who is allowed to
> > issue that request if security is enabled. So, we need to assign the new
> > request a ResourceType and possible AclOperations. See
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > as an example.
>
> (Kowshik): I don't see any reference to the words ResourceType or
> AclOperations
> in the KIP. Please let me know how I can use the KIP that you linked to
> know how to
> setup the appropriate ResourceType and/or ClusterOperation?
>
> > 105. If we change delete to disable, it's better to do this consistently
> in
> > request protocol and admin api as well.
>
> (Kowshik): The API shouldn't be called 'disable' when it is deleting a
> feature.
> I've just changed the KIP to use 'delete'. I don't have a strong
> preference.
>
> > 110. The minVersion/maxVersion for features use int64. Currently, our
> > release version schema is major.minor.bugfix (e.g. 2.5.0). It's possible
> > for new features to be included in minor releases too. Should we make the
> > feature versioning match the release versioning?
>
> (Kowshik): The release version can be mapped to a set of feature versions,
> and this can be done, for example in the tool (or even external to the
> tool).
> Can you please clarify what I'm missing?
>
> > 111. "During regular operations, the data in the ZK node can be mutated
> > only via a specific admin API served only by the controller." I am
> > wondering why can't the controller auto finalize a feature version after
> > all brokers are upgraded? For new users who download the latest version
> to
> > build a new cluster, it's inconvenient for them to have to manually
> enable
> > each feature.
>
> (Kowshik): I agree that there is a trade-off here, but it will help
> to decide whether the automation can be thought through in the future
> in a follow up KIP, or right now in this KIP. We may invest
> in automation, but we have to decide whether we should do it
> now or later.
>
> For the inconvenience that you mentioned, do you think the problem that you
> mentioned can be  overcome by asking for the cluster operator to run a
> bootstrap script  when he/she knows that a specific AK release has been
> almost completely deployed in a cluster for the first time? Idea is that
> the
> bootstrap script will know how to map a specific AK release to finalized
> feature versions, and run the `kafka-features.sh` tool appropriately
> against
> the cluster.
>
> Now, coming back to your automation proposal/question.
> I do see the value of automated feature version finalization, but I also
> see
> that this will open up several questions and some risks, as explained
> below.
> The answers to these depend on the definition of the automation we choose
> to build, and how well does it fit into a kafka deployment.
> Basically, it can be unsafe for the controller to finalize feature version
> upgrades automatically, without learning about the intent of the cluster
> operator.
> 1. We would sometimes want to lock feature versions only when we have
> externally verified
> the stability of the broker binary.
> 2. Sometimes only the cluster operator knows that a cluster upgrade is
> complete,
> and new brokers are highly unlikely to join the cluster.
> 3. Only the cluster operator knows that the intent is to deploy the same
> version
> of the new broker release across the entire cluster (i.e. the latest
> downloaded version).
> 4. For downgrades, it appears the controller still needs some external
> input
> (such as the proposed tool) to finalize a feature version downgrade.
>
> If we have automation, that automation can end up failing in some of the
> cases
> above. Then, we need a way to declare that the cluster is "not ready" if
> the
> controller cannot automatically finalize some basic required feature
> version
> upgrades across the cluster. We need to make the cluster operator aware in
> such a scenario (raise an alert or alike).
>
> > 112. DeleteFeaturesResponse: It seems the apiKey should be 49 instead of
> 48.
>
> (Kowshik): Done.
>
>
> Cheers,
> Kowshik
>
> On Fri, Apr 3, 2020 at 11:24 AM Jun Rao <j...@confluent.io> wrote:
>
>> Hi, Kowshik,
>>
>> Thanks for the reply. A few more comments below.
>>
>> 100.6 For every new request, the admin needs to control who is allowed to
>> issue that request if security is enabled. So, we need to assign the new
>> request a ResourceType and possible AclOperations. See
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
>> as
>> an example.
>>
>> 105. If we change delete to disable, it's better to do this consistently
>> in
>> request protocol and admin api as well.
>>
>> 110. The minVersion/maxVersion for features use int64. Currently, our
>> release version schema is major.minor.bugfix (e.g. 2.5.0). It's possible
>> for new features to be included in minor releases too. Should we make the
>> feature versioning match the release versioning?
>>
>> 111. "During regular operations, the data in the ZK node can be mutated
>> only via a specific admin API served only by the controller." I am
>> wondering why can't the controller auto finalize a feature version after
>> all brokers are upgraded? For new users who download the latest version to
>> build a new cluster, it's inconvenient for them to have to manually enable
>> each feature.
>>
>> 112. DeleteFeaturesResponse: It seems the apiKey should be 49 instead of
>> 48.
>>
>> Jun
>>
>>
>> On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam <kpraka...@confluent.io>
>> wrote:
>>
>> > Hey Jun,
>> >
>> > Thanks a lot for the great feedback! Please note that the design
>> > has changed a little bit on the KIP, and we now propagate the finalized
>> > features metadata only via ZK watches (instead of UpdateMetadataRequest
>> > from the controller).
>> >
>> > Please find below my response to your questions/feedback, with the
>> prefix
>> > "(Kowshik):".
>> >
>> > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
>> > > 100.1 Since this request waits for responses from brokers, should we
>> add
>> > a
>> > > timeout in the request (like createTopicRequest)?
>> >
>> > (Kowshik): Great point! Done. I have added a timeout field. Note: we no
>> > longer
>> > wait for responses from brokers, since the design has been changed so
>> that
>> > the
>> > features information is propagated via ZK. Nevertheless, it is right to
>> > have a timeout
>> > for the request.
>> >
>> > > 100.2 The response schema is a bit weird. Typically, the response just
>> > > shows an error code and an error message, instead of echoing the
>> request.
>> >
>> > (Kowshik): Great point! Yeah, I have modified it to just return an error
>> > code and a message.
>> > Previously it was not echoing the "request", rather it was returning the
>> > latest set of
>> > cluster-wide finalized features (after applying the updates). But you
>> are
>> > right,
>> > the additional info is not required, so I have removed it from the
>> response
>> > schema.
>> >
>> > > 100.3 Should we add a separate request to list/describe the existing
>> > > features?
>> >
>> > (Kowshik): This is already present in the KIP via the 'DescribeFeatures'
>> > Admin API,
>> > which, underneath covers uses the ApiVersionsRequest to list/describe
>> the
>> > existing features. Please read the 'Tooling support' section.
>> >
>> > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
>> > > DELETE, the version field doesn't make sense. So, I guess the broker
>> just
>> > > ignores this? An alternative way is to have a separate
>> > DeleteFeaturesRequest
>> >
>> > (Kowshik): Great point! I have modified the KIP now to have 2 separate
>> > controller APIs
>> > serving these different purposes:
>> > 1. updateFeatures
>> > 2. deleteFeatures
>> >
>> > > 100.5 In UpdateFeaturesResponse, we have "The monotonically increasing
>> > > version of the metadata for finalized features." I am wondering why
>> the
>> > > ordering is important?
>> >
>> > (Kowshik): In the latest KIP write-up, it is called epoch (instead of
>> > version), and
>> > it is just the ZK node version. Basically, this is the epoch for the
>> > cluster-wide
>> > finalized feature version metadata. This metadata is served to clients
>> via
>> > the
>> > ApiVersionsResponse (for reads). We propagate updates from the
>> '/features'
>> > ZK node
>> > to all brokers, via ZK watches setup by each broker on the '/features'
>> > node.
>> >
>> > Now here is why the ordering is important:
>> > ZK watches don't propagate at the same time. As a result, the
>> > ApiVersionsResponse
>> > is eventually consistent across brokers. This can introduce cases
>> > where clients see an older lower epoch of the features metadata, after a
>> > more recent
>> > higher epoch was returned at a previous point in time. We expect clients
>> > to always employ the rule that the latest received higher epoch of
>> metadata
>> > always trumps an older smaller epoch. Those clients that are external to
>> > Kafka should strongly consider discovering the latest metadata once
>> during
>> > startup from the brokers, and if required refresh the metadata
>> periodically
>> > (to get the latest metadata).
>> >
>> > > 100.6 Could you specify the required ACL for this new request?
>> >
>> > (Kowshik): What is ACL, and how could I find out which one to specify?
>> > Please could you provide me some pointers? I'll be glad to update the
>> > KIP once I know the next steps.
>> >
>> > > 101. For the broker registration ZK node, should we bump up the
>> version
>> > in
>> > the json?
>> >
>> > (Kowshik): Great point! Done. I've increased the version in the broker
>> json
>> > by 1.
>> >
>> > > 102. For the /features ZK node, not sure if we need the epoch field.
>> Each
>> > > ZK node has an internal version field that is incremented on every
>> > update.
>> >
>> > (Kowshik): Great point! Done. I'm using the ZK node version now,
>> instead of
>> > explicitly
>> > incremented epoch.
>> >
>> > > 103. "Enabling the actual semantics of a feature version cluster-wide
>> is
>> > > left to the discretion of the logic implementing the feature (ex: can
>> be
>> > > done via dynamic broker config)." Does that mean the broker
>> registration
>> > ZK
>> > > node will be updated dynamically when this happens?
>> >
>> > (Kowshik): Not really. The text was just conveying that a broker could
>> > "know" of
>> > a new feature version, but it does not mean the broker should have also
>> > activated the effects of the feature version. Knowing vs activation are
>> 2
>> > separate things,
>> > and the latter can be achieved by dynamic config. I have reworded the
>> text
>> > to
>> > make this clear to the reader.
>> >
>> >
>> > > 104. UpdateMetadataRequest
>> > > 104.1 It would be useful to describe when the feature metadata is
>> > included
>> > > in the request. My understanding is that it's only included if (1)
>> there
>> > is
>> > > a change to the finalized feature; (2) broker restart; (3) controller
>> > > failover.
>> > > 104.2 The new fields have the following versions. Why are the
>> versions 3+
>> > > when the top version is bumped to 6?
>> > >       "fields":  [
>> > >         {"name": "Name", "type":  "string", "versions":  "3+",
>> > >           "about": "The name of the feature."},
>> > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
>> > >           "about": "The finalized version for the feature."}
>> > >       ]
>> >
>> > (Kowshik): With the new improved design, we have completely eliminated
>> the
>> > need to
>> > use UpdateMetadataRequest. This is because we now rely on ZK to deliver
>> the
>> > notifications for changes to the '/features' ZK node.
>> >
>> > > 105. kafka-features.sh: Instead of using update/delete, perhaps it's
>> > better
>> > > to use enable/disable?
>> >
>> > (Kowshik): For delete, yes, I have changed it so that we instead call it
>> > 'disable'.
>> > However for 'update', it can now also refer to either an upgrade or a
>> > forced downgrade.
>> > Therefore, I have left it the way it is, just calling it as just
>> 'update'.
>> >
>> >
>> > Cheers,
>> > Kowshik
>> >
>> > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <j...@confluent.io> wrote:
>> >
>> > > Hi, Kowshik,
>> > >
>> > > Thanks for the KIP. Looks good overall. A few comments below.
>> > >
>> > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
>> > > 100.1 Since this request waits for responses from brokers, should we
>> add
>> > a
>> > > timeout in the request (like createTopicRequest)?
>> > > 100.2 The response schema is a bit weird. Typically, the response just
>> > > shows an error code and an error message, instead of echoing the
>> request.
>> > > 100.3 Should we add a separate request to list/describe the existing
>> > > features?
>> > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
>> > > DELETE, the version field doesn't make sense. So, I guess the broker
>> just
>> > > ignores this? An alternative way is to have a separate
>> > > DeleteFeaturesRequest
>> > > 100.5 In UpdateFeaturesResponse, we have "The monotonically increasing
>> > > version of the metadata for finalized features." I am wondering why
>> the
>> > > ordering is important?
>> > > 100.6 Could you specify the required ACL for this new request?
>> > >
>> > > 101. For the broker registration ZK node, should we bump up the
>> version
>> > in
>> > > the json?
>> > >
>> > > 102. For the /features ZK node, not sure if we need the epoch field.
>> Each
>> > > ZK node has an internal version field that is incremented on every
>> > update.
>> > >
>> > > 103. "Enabling the actual semantics of a feature version cluster-wide
>> is
>> > > left to the discretion of the logic implementing the feature (ex: can
>> be
>> > > done via dynamic broker config)." Does that mean the broker
>> registration
>> > ZK
>> > > node will be updated dynamically when this happens?
>> > >
>> > > 104. UpdateMetadataRequest
>> > > 104.1 It would be useful to describe when the feature metadata is
>> > included
>> > > in the request. My understanding is that it's only included if (1)
>> there
>> > is
>> > > a change to the finalized feature; (2) broker restart; (3) controller
>> > > failover.
>> > > 104.2 The new fields have the following versions. Why are the
>> versions 3+
>> > > when the top version is bumped to 6?
>> > >       "fields":  [
>> > >         {"name": "Name", "type":  "string", "versions":  "3+",
>> > >           "about": "The name of the feature."},
>> > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
>> > >           "about": "The finalized version for the feature."}
>> > >       ]
>> > >
>> > > 105. kafka-features.sh: Instead of using update/delete, perhaps it's
>> > better
>> > > to use enable/disable?
>> > >
>> > > Jun
>> > >
>> > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <
>> kpraka...@confluent.io
>> > >
>> > > wrote:
>> > >
>> > > > Hey Boyang,
>> > > >
>> > > > Thanks for the great feedback! I have updated the KIP based on your
>> > > > feedback.
>> > > > Please find my response below for your comments, look for sentences
>> > > > starting
>> > > > with "(Kowshik)" below.
>> > > >
>> > > >
>> > > > > 1. "When is it safe for the brokers to begin handling EOS traffic"
>> > > could
>> > > > be
>> > > > > converted as "When is it safe for the brokers to start serving new
>> > > > > Exactly-Once(EOS) semantics" since EOS is not explained earlier in
>> > the
>> > > > > context.
>> > > >
>> > > > (Kowshik): Great point! Done.
>> > > >
>> > > > > 2. In the *Explanation *section, the metadata version number part
>> > > seems a
>> > > > > bit blurred. Could you point a reference to later section that we
>> > going
>> > > > to
>> > > > > store it in Zookeeper and update it every time when there is a
>> > feature
>> > > > > change?
>> > > >
>> > > > (Kowshik): Great point! Done. I've added a reference in the KIP.
>> > > >
>> > > >
>> > > > > 3. For the feature downgrade, although it's a Non-goal of the KIP,
>> > for
>> > > > > features such as group coordinator semantics, there is no legal
>> > > scenario
>> > > > to
>> > > > > perform a downgrade at all. So having downgrade door open is
>> pretty
>> > > > > error-prone as human faults happen all the time. I'm assuming as
>> new
>> > > > > features are implemented, it's not very hard to add a flag during
>> > > feature
>> > > > > creation to indicate whether this feature is "downgradable". Could
>> > you
>> > > > > explain a bit more on the extra engineering effort for shipping
>> this
>> > > KIP
>> > > > > with downgrade protection in place?
>> > > >
>> > > > (Kowshik): Great point! I'd agree and disagree here. While I agree
>> that
>> > > > accidental
>> > > > downgrades can cause problems, I also think sometimes downgrades
>> should
>> > > > be allowed for emergency reasons (not all downgrades cause issues).
>> > > > It is just subjective to the feature being downgraded.
>> > > >
>> > > > To be more strict about feature version downgrades, I have modified
>> the
>> > > KIP
>> > > > proposing that we mandate a `--force-downgrade` flag be used in the
>> > > > UPDATE_FEATURES api
>> > > > and the tooling, whenever the human is downgrading a finalized
>> feature
>> > > > version.
>> > > > Hopefully this should cover the requirement, until we find the need
>> for
>> > > > advanced downgrade support.
>> > > >
>> > > > > 4. "Each broker’s supported dictionary of feature versions will be
>> > > > defined
>> > > > > in the broker code." So this means in order to restrict a certain
>> > > > feature,
>> > > > > we need to start the broker first and then send a feature gating
>> > > request
>> > > > > immediately, which introduces a time gap and the intended-to-close
>> > > > feature
>> > > > > could actually serve request during this phase. Do you think we
>> > should
>> > > > also
>> > > > > support configurations as well so that admin user could freely
>> roll
>> > up
>> > > a
>> > > > > cluster with all nodes complying the same feature gating, without
>> > > > worrying
>> > > > > about the turnaround time to propagate the message only after the
>> > > cluster
>> > > > > starts up?
>> > > >
>> > > > (Kowshik): This is a great point/question. One of the expectations
>> out
>> > of
>> > > > this KIP, which is
>> > > > already followed in the broker, is the following.
>> > > >  - Imagine at time T1 the broker starts up and registers it’s
>> presence
>> > in
>> > > > ZK,
>> > > >    along with advertising it’s supported features.
>> > > >  - Imagine at a future time T2 the broker receives the
>> > > > UpdateMetadataRequest
>> > > >    from the controller, which contains the latest finalized
>> features as
>> > > > seen by
>> > > >    the controller. The broker validates this data against it’s
>> > supported
>> > > > features to
>> > > >    make sure there is no mismatch (it will shutdown if there is an
>> > > > incompatibility).
>> > > >
>> > > > It is expected that during the time between the 2 events T1 and T2,
>> the
>> > > > broker is
>> > > > almost a silent entity in the cluster. It does not add any value to
>> the
>> > > > cluster, or carry
>> > > > out any important broker activities. By “important”, I mean it is
>> not
>> > > doing
>> > > > mutations
>> > > > on it’s persistence, not mutating critical in-memory state, won’t be
>> > > > serving
>> > > > produce/fetch requests. Note it doesn’t even know it’s assigned
>> > > partitions
>> > > > until
>> > > > it receives UpdateMetadataRequest from controller. Anything the
>> broker
>> > is
>> > > > doing up
>> > > > until this point is not damaging/useful.
>> > > >
>> > > > I’ve clarified the above in the KIP, see this new section:
>> > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
>> > > > .
>> > > >
>> > > > > 5. "adding a new Feature, updating or deleting an existing
>> Feature",
>> > > may
>> > > > be
>> > > > > I misunderstood something, I thought the features are defined in
>> > broker
>> > > > > code, so admin could not really create a new feature?
>> > > >
>> > > > (Kowshik): Great point! You understood this right. Here adding a
>> > feature
>> > > > means we are
>> > > > adding a cluster-wide finalized *max* version for a feature that was
>> > > > previously never finalized.
>> > > > I have clarified this in the KIP now.
>> > > >
>> > > > > 6. I think we need a separate error code like
>> > > FEATURE_UPDATE_IN_PROGRESS
>> > > > to
>> > > > > reject a concurrent feature update request.
>> > > >
>> > > > (Kowshik): Great point! I have modified the KIP adding the above
>> (see
>> > > > 'Tooling support -> Admin API changes').
>> > > >
>> > > > > 7. I think we haven't discussed the alternative solution to pass
>> the
>> > > > > feature information through Zookeeper. Is that mentioned in the
>> KIP
>> > to
>> > > > > justify why using UpdateMetadata is more favorable?
>> > > >
>> > > > (Kowshik): Nice question! The broker reads finalized feature info
>> > stored
>> > > in
>> > > > ZK,
>> > > > only during startup when it does a validation. When serving
>> > > > `ApiVersionsRequest`, the
>> > > > broker does not read this info from ZK directly. I'd imagine the
>> risk
>> > is
>> > > > that it can increase
>> > > > the ZK read QPS which can be a bottleneck for the system. Today, in
>> > Kafka
>> > > > we use the
>> > > > controller to fan out ZK updates to brokers and we want to stick to
>> > that
>> > > > pattern to avoid
>> > > > the ZK read bottleneck when serving `ApiVersionsRequest`.
>> > > >
>> > > > > 8. I was under the impression that user could configure a range of
>> > > > > supported versions, what's the trade-off for allowing single
>> > finalized
>> > > > > version only?
>> > > >
>> > > > (Kowshik): Great question! The finalized version of a feature
>> basically
>> > > > refers to
>> > > > the cluster-wide finalized feature "maximum" version. For example,
>> if
>> > the
>> > > > 'group_coordinator' feature
>> > > > has the finalized version set to 10, then, it means that
>> cluster-wide
>> > all
>> > > > versions upto v10 are
>> > > > supported for this feature. However, note that if some version (ex:
>> v0)
>> > > > gets deprecated
>> > > > for this feature, then we don’t convey that using this scheme (also
>> > > > supporting deprecation is a non-goal).
>> > > >
>> > > > (Kowshik): I’ve now modified the KIP at all points, refering to
>> > finalized
>> > > > feature "maximum" versions.
>> > > >
>> > > > > 9. One minor syntax fix: Note that here the "client" here may be a
>> > > > producer
>> > > >
>> > > > (Kowshik): Great point! Done.
>> > > >
>> > > >
>> > > > Cheers,
>> > > > Kowshik
>> > > >
>> > > >
>> > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <
>> > reluctanthero...@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > Hey Kowshik,
>> > > > >
>> > > > > thanks for the revised KIP. Got a couple of questions:
>> > > > >
>> > > > > 1. "When is it safe for the brokers to begin handling EOS traffic"
>> > > could
>> > > > be
>> > > > > converted as "When is it safe for the brokers to start serving new
>> > > > > Exactly-Once(EOS) semantics" since EOS is not explained earlier in
>> > the
>> > > > > context.
>> > > > >
>> > > > > 2. In the *Explanation *section, the metadata version number part
>> > > seems a
>> > > > > bit blurred. Could you point a reference to later section that we
>> > going
>> > > > to
>> > > > > store it in Zookeeper and update it every time when there is a
>> > feature
>> > > > > change?
>> > > > >
>> > > > > 3. For the feature downgrade, although it's a Non-goal of the KIP,
>> > for
>> > > > > features such as group coordinator semantics, there is no legal
>> > > scenario
>> > > > to
>> > > > > perform a downgrade at all. So having downgrade door open is
>> pretty
>> > > > > error-prone as human faults happen all the time. I'm assuming as
>> new
>> > > > > features are implemented, it's not very hard to add a flag during
>> > > feature
>> > > > > creation to indicate whether this feature is "downgradable". Could
>> > you
>> > > > > explain a bit more on the extra engineering effort for shipping
>> this
>> > > KIP
>> > > > > with downgrade protection in place?
>> > > > >
>> > > > > 4. "Each broker’s supported dictionary of feature versions will be
>> > > > defined
>> > > > > in the broker code." So this means in order to restrict a certain
>> > > > feature,
>> > > > > we need to start the broker first and then send a feature gating
>> > > request
>> > > > > immediately, which introduces a time gap and the intended-to-close
>> > > > feature
>> > > > > could actually serve request during this phase. Do you think we
>> > should
>> > > > also
>> > > > > support configurations as well so that admin user could freely
>> roll
>> > up
>> > > a
>> > > > > cluster with all nodes complying the same feature gating, without
>> > > > worrying
>> > > > > about the turnaround time to propagate the message only after the
>> > > cluster
>> > > > > starts up?
>> > > > >
>> > > > > 5. "adding a new Feature, updating or deleting an existing
>> Feature",
>> > > may
>> > > > be
>> > > > > I misunderstood something, I thought the features are defined in
>> > broker
>> > > > > code, so admin could not really create a new feature?
>> > > > >
>> > > > > 6. I think we need a separate error code like
>> > > FEATURE_UPDATE_IN_PROGRESS
>> > > > to
>> > > > > reject a concurrent feature update request.
>> > > > >
>> > > > > 7. I think we haven't discussed the alternative solution to pass
>> the
>> > > > > feature information through Zookeeper. Is that mentioned in the
>> KIP
>> > to
>> > > > > justify why using UpdateMetadata is more favorable?
>> > > > >
>> > > > > 8. I was under the impression that user could configure a range of
>> > > > > supported versions, what's the trade-off for allowing single
>> > finalized
>> > > > > version only?
>> > > > >
>> > > > > 9. One minor syntax fix: Note that here the "client" here may be a
>> > > > producer
>> > > > >
>> > > > > Boyang
>> > > > >
>> > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <cmcc...@apache.org>
>> > > wrote:
>> > > > >
>> > > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam wrote:
>> > > > > > > Hi Colin,
>> > > > > > >
>> > > > > > > Thanks for the feedback! I've changed the KIP to address your
>> > > > > > > suggestions.
>> > > > > > > Please find below my explanation. Here is a link to KIP 584:
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
>> > > > > > > .
>> > > > > > >
>> > > > > > > 1. '__data_version__' is the version of the finalized feature
>> > > > metadata
>> > > > > > > (i.e. actual ZK node contents), while the
>> '__schema_version__' is
>> > > the
>> > > > > > > version of the schema of the data persisted in ZK. These serve
>> > > > > different
>> > > > > > > purposes. '__data_version__' is is useful mainly to clients
>> > during
>> > > > > reads,
>> > > > > > > to differentiate between the 2 versions of eventually
>> consistent
>> > > > > > 'finalized
>> > > > > > > features' metadata (i.e. larger metadata version is more
>> recent).
>> > > > > > > '__schema_version__' provides an additional degree of
>> > flexibility,
>> > > > > where
>> > > > > > if
>> > > > > > > we decide to change the schema for '/features' node in ZK (in
>> the
>> > > > > > future),
>> > > > > > > then we can manage broker roll outs suitably (i.e.
>> > > > > > > serialization/deserialization of the ZK data can be handled
>> > > safely).
>> > > > > >
>> > > > > > Hi Kowshik,
>> > > > > >
>> > > > > > If you're talking about a number that lets you know if data is
>> more
>> > > or
>> > > > > > less recent, we would typically call that an epoch, and not a
>> > > version.
>> > > > > For
>> > > > > > the ZK data structures, the word "version" is typically reserved
>> > for
>> > > > > > describing changes to the overall schema of the data that is
>> > written
>> > > to
>> > > > > > ZooKeeper.  We don't even really change the "version" of those
>> > > schemas
>> > > > > that
>> > > > > > much, since most changes are backwards-compatible.  But we do
>> > include
>> > > > > that
>> > > > > > version field just in case.
>> > > > > >
>> > > > > > I don't think we really need an epoch here, though, since we can
>> > just
>> > > > > look
>> > > > > > at the broker epoch.  Whenever the broker registers, its epoch
>> will
>> > > be
>> > > > > > greater than the previous broker epoch.  And the newly
>> registered
>> > > data
>> > > > > will
>> > > > > > take priority.  This will be a lot simpler than adding a
>> separate
>> > > epoch
>> > > > > > system, I think.
>> > > > > >
>> > > > > > >
>> > > > > > > 2. Regarding admin client needing min and max information -
>> you
>> > are
>> > > > > > right!
>> > > > > > > I've changed the KIP such that the Admin API also allows the
>> user
>> > > to
>> > > > > read
>> > > > > > > 'supported features' from a specific broker. Please look at
>> the
>> > > > section
>> > > > > > > "Admin API changes".
>> > > > > >
>> > > > > > Thanks.
>> > > > > >
>> > > > > > >
>> > > > > > > 3. Regarding the use of `long` vs `Long` - it was not
>> deliberate.
>> > > > I've
>> > > > > > > improved the KIP to just use `long` at all places.
>> > > > > >
>> > > > > > Sounds good.
>> > > > > >
>> > > > > > >
>> > > > > > > 4. Regarding kafka.admin.FeatureCommand tool - you are right!
>> > I've
>> > > > > > updated
>> > > > > > > the KIP sketching the functionality provided by this tool,
>> with
>> > > some
>> > > > > > > examples. Please look at the section "Tooling support
>> examples".
>> > > > > > >
>> > > > > > > Thank you!
>> > > > > >
>> > > > > >
>> > > > > > Thanks, Kowshik.
>> > > > > >
>> > > > > > cheers,
>> > > > > > Colin
>> > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > Cheers,
>> > > > > > > Kowshik
>> > > > > > >
>> > > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <
>> > cmcc...@apache.org>
>> > > > > > wrote:
>> > > > > > >
>> > > > > > > > Thanks, Kowshik, this looks good.
>> > > > > > > >
>> > > > > > > > In the "Schema" section, do we really need both
>> > > __schema_version__
>> > > > > and
>> > > > > > > > __data_version__?  Can we just have a single version field
>> > here?
>> > > > > > > >
>> > > > > > > > Shouldn't the Admin(Client) function have some way to get
>> the
>> > min
>> > > > and
>> > > > > > max
>> > > > > > > > information that we're exposing as well?  I guess we could
>> have
>> > > > min,
>> > > > > > max,
>> > > > > > > > and current.  Unrelated: is the use of Long rather than long
>> > > > > deliberate
>> > > > > > > > here?
>> > > > > > > >
>> > > > > > > > It would be good to describe how the command line tool
>> > > > > > > > kafka.admin.FeatureCommand will work.  For example the flags
>> > that
>> > > > it
>> > > > > > will
>> > > > > > > > take and the output that it will generate to STDOUT.
>> > > > > > > >
>> > > > > > > > cheers,
>> > > > > > > > Colin
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam wrote:
>> > > > > > > > > Hi all,
>> > > > > > > > >
>> > > > > > > > > I've opened KIP-584
>> > > <https://issues.apache.org/jira/browse/KIP-584> <
>> > > > https://issues.apache.org/jira/browse/KIP-584
>> > > > > >
>> > > > > > > > > which
>> > > > > > > > > is intended to provide a versioning scheme for features.
>> I'd
>> > > like
>> > > > > to
>> > > > > > use
>> > > > > > > > > this thread to discuss the same. I'd appreciate any
>> feedback
>> > on
>> > > > > this.
>> > > > > > > > > Here
>> > > > > > > > > is a link to KIP-584
>> > > <https://issues.apache.org/jira/browse/KIP-584>:
>> > > > > > > > >
>> > > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
>> > > > > > > > >  .
>> > > > > > > > >
>> > > > > > > > > Thank you!
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > Cheers,
>> > > > > > > > > Kowshik
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Reply via email to