On Fri, Apr 3, 2020, at 20:32, Kowshik Prakasam wrote:
> > Colin wrote:
> > It would be simpler to just say that a feature flag which doesn't appear
> > in the znode is considered to be at version level 0.  This will also
> > simplify the code a lot, I think, since you won't have to keep track of
> > tricky distinctions between "disabled" and "enabled at version 0."
> > Then you would be able to just use an int in most places.
> 
> (Kowshik): I'm not sure I understood why we want do it this way. If an
> entry for some finalized feature is absent in '/features' node,
> alternatively we can just treat this as a feature with a version that
> was never finalized/enabled or it was deleted at some point. Then, we can
> even allow for "enabled at version 0" as the {minVersion, maxVersion} range
> can be any valid range, not necessarily minVersion > 0.

Think about the following pseudocode.  Which is simpler:

> if (feature is not present) || (feature level < 1) {
> ... something ...
> } else {
> ... something ...
> }

or

> if (feature level < 1) {
> ... something ...
> } else {
> ... something ...
> }

If you can just treat "not present" as version level 0, you can have just 
checks like the second one.  This should lead to simpler code.

> (Kowshik): Yes, the whole change is a transaction. Either all provided
> FeatureUpdate is carried out in ZK, or none happens. That's why we just
> allow for a single error code field, as it is easier that way. This
> transactional guarantee is mentioned under 'Proposed changes > New
> controller API'

That makes sense, thanks.

> > Rather than FeatureUpdateType, I would just go with a boolean like
> > "force."  I'm not sure what other values we'd want to add to this later on,
> > if it were an enum.  I think the boolean is clearer.
> 
> (Kowshik): Since we have decided to go just one API (i.e.
> UpdateFeaturesRequest), it is better that FeatureUpdateType is an enum with
> multiple values. A FeatureUpdateType is tied to a feature, and the possible
> values are: ADD_OR_UPDATE, ADD_OR_UPDATE_ALLOW_DOWNGRADE, DELETE.

I guess this ties in with the discussion above-- I would rather not have a 
"deleted" state.  It doesn't seem to make anything more expressive, and it 
complicates the code.

> 
> > This ties in with my comment earlier, but for the result classes, we need
> > methods other than just "all".  Batch operations aren't usable if
> > you can't get the result per operation.... unless the semantics are
> > transactional and it really is just everything succeeded or everything
> > failed.
> 
> (Kowshik): The semantics are transactional, as I explained above.

Thanks for the clarification.

> 
> > There are a bunch of Java interfaces described like FinalizedFeature,
> > FeatureUpdate, UpdateFeaturesResult, and so on that should just be
> > regular concrete Java classes.  In general we'd only use an interface if
> > we wanted the caller to implement some kind of callback function. We
> > don't make classes that are just designed to hold data into interfaces,
> > since that just imposes extra work on callers (they have to define
> > their own concrete class for each interface just to use the API.)
> >  There's also probably no reason to have these classes inherit from each
> > other or have complex type relationships.  One more nitpick is that Kafka
> > generally doesn't use "get" in the function names of accessors.
> 
> (Kowshik): Done, I have changed the KIP. By 'interface', I just meant
> interface from a pseudocode standpoint (i.e. it was just an abstraction
> providing at least the specified behavior). Since that was a bit confusing,
> I have now renamed it calling it a class. Also I have eliminated the type
> relationships.

Thanks.

best,
Colin

> 
> 
> Cheers,
> Kowshik
> 
> On Fri, Apr 3, 2020 at 5:54 PM Kowshik Prakasam <kpraka...@confluent.io>
> wrote:
> 
> > Hi Jun,
> >
> > Thanks for the feedback and suggestions. Please find my response below.
> >
> > > 100.6 For every new request, the admin needs to control who is allowed to
> > > issue that request if security is enabled. So, we need to assign the new
> > > request a ResourceType and possible AclOperations. See
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > as an example.
> >
> > (Kowshik): I don't see any reference to the words ResourceType or
> > AclOperations
> > in the KIP. Please let me know how I can use the KIP that you linked to
> > know how to
> > setup the appropriate ResourceType and/or ClusterOperation?
> >
> > > 105. If we change delete to disable, it's better to do this consistently
> > in
> > > request protocol and admin api as well.
> >
> > (Kowshik): The API shouldn't be called 'disable' when it is deleting a
> > feature.
> > I've just changed the KIP to use 'delete'. I don't have a strong
> > preference.
> >
> > > 110. The minVersion/maxVersion for features use int64. Currently, our
> > > release version schema is major.minor.bugfix (e.g. 2.5.0). It's possible
> > > for new features to be included in minor releases too. Should we make the
> > > feature versioning match the release versioning?
> >
> > (Kowshik): The release version can be mapped to a set of feature versions,
> > and this can be done, for example in the tool (or even external to the
> > tool).
> > Can you please clarify what I'm missing?
> >
> > > 111. "During regular operations, the data in the ZK node can be mutated
> > > only via a specific admin API served only by the controller." I am
> > > wondering why can't the controller auto finalize a feature version after
> > > all brokers are upgraded? For new users who download the latest version
> > to
> > > build a new cluster, it's inconvenient for them to have to manually
> > enable
> > > each feature.
> >
> > (Kowshik): I agree that there is a trade-off here, but it will help
> > to decide whether the automation can be thought through in the future
> > in a follow up KIP, or right now in this KIP. We may invest
> > in automation, but we have to decide whether we should do it
> > now or later.
> >
> > For the inconvenience that you mentioned, do you think the problem that you
> > mentioned can be  overcome by asking for the cluster operator to run a
> > bootstrap script  when he/she knows that a specific AK release has been
> > almost completely deployed in a cluster for the first time? Idea is that
> > the
> > bootstrap script will know how to map a specific AK release to finalized
> > feature versions, and run the `kafka-features.sh` tool appropriately
> > against
> > the cluster.
> >
> > Now, coming back to your automation proposal/question.
> > I do see the value of automated feature version finalization, but I also
> > see
> > that this will open up several questions and some risks, as explained
> > below.
> > The answers to these depend on the definition of the automation we choose
> > to build, and how well does it fit into a kafka deployment.
> > Basically, it can be unsafe for the controller to finalize feature version
> > upgrades automatically, without learning about the intent of the cluster
> > operator.
> > 1. We would sometimes want to lock feature versions only when we have
> > externally verified
> > the stability of the broker binary.
> > 2. Sometimes only the cluster operator knows that a cluster upgrade is
> > complete,
> > and new brokers are highly unlikely to join the cluster.
> > 3. Only the cluster operator knows that the intent is to deploy the same
> > version
> > of the new broker release across the entire cluster (i.e. the latest
> > downloaded version).
> > 4. For downgrades, it appears the controller still needs some external
> > input
> > (such as the proposed tool) to finalize a feature version downgrade.
> >
> > If we have automation, that automation can end up failing in some of the
> > cases
> > above. Then, we need a way to declare that the cluster is "not ready" if
> > the
> > controller cannot automatically finalize some basic required feature
> > version
> > upgrades across the cluster. We need to make the cluster operator aware in
> > such a scenario (raise an alert or alike).
> >
> > > 112. DeleteFeaturesResponse: It seems the apiKey should be 49 instead of
> > 48.
> >
> > (Kowshik): Done.
> >
> >
> > Cheers,
> > Kowshik
> >
> > On Fri, Apr 3, 2020 at 11:24 AM Jun Rao <j...@confluent.io> wrote:
> >
> >> Hi, Kowshik,
> >>
> >> Thanks for the reply. A few more comments below.
> >>
> >> 100.6 For every new request, the admin needs to control who is allowed to
> >> issue that request if security is enabled. So, we need to assign the new
> >> request a ResourceType and possible AclOperations. See
> >>
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> >> as
> >> an example.
> >>
> >> 105. If we change delete to disable, it's better to do this consistently
> >> in
> >> request protocol and admin api as well.
> >>
> >> 110. The minVersion/maxVersion for features use int64. Currently, our
> >> release version schema is major.minor.bugfix (e.g. 2.5.0). It's possible
> >> for new features to be included in minor releases too. Should we make the
> >> feature versioning match the release versioning?
> >>
> >> 111. "During regular operations, the data in the ZK node can be mutated
> >> only via a specific admin API served only by the controller." I am
> >> wondering why can't the controller auto finalize a feature version after
> >> all brokers are upgraded? For new users who download the latest version to
> >> build a new cluster, it's inconvenient for them to have to manually enable
> >> each feature.
> >>
> >> 112. DeleteFeaturesResponse: It seems the apiKey should be 49 instead of
> >> 48.
> >>
> >> Jun
> >>
> >>
> >> On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam <kpraka...@confluent.io>
> >> wrote:
> >>
> >> > Hey Jun,
> >> >
> >> > Thanks a lot for the great feedback! Please note that the design
> >> > has changed a little bit on the KIP, and we now propagate the finalized
> >> > features metadata only via ZK watches (instead of UpdateMetadataRequest
> >> > from the controller).
> >> >
> >> > Please find below my response to your questions/feedback, with the
> >> prefix
> >> > "(Kowshik):".
> >> >
> >> > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> >> > > 100.1 Since this request waits for responses from brokers, should we
> >> add
> >> > a
> >> > > timeout in the request (like createTopicRequest)?
> >> >
> >> > (Kowshik): Great point! Done. I have added a timeout field. Note: we no
> >> > longer
> >> > wait for responses from brokers, since the design has been changed so
> >> that
> >> > the
> >> > features information is propagated via ZK. Nevertheless, it is right to
> >> > have a timeout
> >> > for the request.
> >> >
> >> > > 100.2 The response schema is a bit weird. Typically, the response just
> >> > > shows an error code and an error message, instead of echoing the
> >> request.
> >> >
> >> > (Kowshik): Great point! Yeah, I have modified it to just return an error
> >> > code and a message.
> >> > Previously it was not echoing the "request", rather it was returning the
> >> > latest set of
> >> > cluster-wide finalized features (after applying the updates). But you
> >> are
> >> > right,
> >> > the additional info is not required, so I have removed it from the
> >> response
> >> > schema.
> >> >
> >> > > 100.3 Should we add a separate request to list/describe the existing
> >> > > features?
> >> >
> >> > (Kowshik): This is already present in the KIP via the 'DescribeFeatures'
> >> > Admin API,
> >> > which, underneath covers uses the ApiVersionsRequest to list/describe
> >> the
> >> > existing features. Please read the 'Tooling support' section.
> >> >
> >> > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
> >> > > DELETE, the version field doesn't make sense. So, I guess the broker
> >> just
> >> > > ignores this? An alternative way is to have a separate
> >> > DeleteFeaturesRequest
> >> >
> >> > (Kowshik): Great point! I have modified the KIP now to have 2 separate
> >> > controller APIs
> >> > serving these different purposes:
> >> > 1. updateFeatures
> >> > 2. deleteFeatures
> >> >
> >> > > 100.5 In UpdateFeaturesResponse, we have "The monotonically increasing
> >> > > version of the metadata for finalized features." I am wondering why
> >> the
> >> > > ordering is important?
> >> >
> >> > (Kowshik): In the latest KIP write-up, it is called epoch (instead of
> >> > version), and
> >> > it is just the ZK node version. Basically, this is the epoch for the
> >> > cluster-wide
> >> > finalized feature version metadata. This metadata is served to clients
> >> via
> >> > the
> >> > ApiVersionsResponse (for reads). We propagate updates from the
> >> '/features'
> >> > ZK node
> >> > to all brokers, via ZK watches setup by each broker on the '/features'
> >> > node.
> >> >
> >> > Now here is why the ordering is important:
> >> > ZK watches don't propagate at the same time. As a result, the
> >> > ApiVersionsResponse
> >> > is eventually consistent across brokers. This can introduce cases
> >> > where clients see an older lower epoch of the features metadata, after a
> >> > more recent
> >> > higher epoch was returned at a previous point in time. We expect clients
> >> > to always employ the rule that the latest received higher epoch of
> >> metadata
> >> > always trumps an older smaller epoch. Those clients that are external to
> >> > Kafka should strongly consider discovering the latest metadata once
> >> during
> >> > startup from the brokers, and if required refresh the metadata
> >> periodically
> >> > (to get the latest metadata).
> >> >
> >> > > 100.6 Could you specify the required ACL for this new request?
> >> >
> >> > (Kowshik): What is ACL, and how could I find out which one to specify?
> >> > Please could you provide me some pointers? I'll be glad to update the
> >> > KIP once I know the next steps.
> >> >
> >> > > 101. For the broker registration ZK node, should we bump up the
> >> version
> >> > in
> >> > the json?
> >> >
> >> > (Kowshik): Great point! Done. I've increased the version in the broker
> >> json
> >> > by 1.
> >> >
> >> > > 102. For the /features ZK node, not sure if we need the epoch field.
> >> Each
> >> > > ZK node has an internal version field that is incremented on every
> >> > update.
> >> >
> >> > (Kowshik): Great point! Done. I'm using the ZK node version now,
> >> instead of
> >> > explicitly
> >> > incremented epoch.
> >> >
> >> > > 103. "Enabling the actual semantics of a feature version cluster-wide
> >> is
> >> > > left to the discretion of the logic implementing the feature (ex: can
> >> be
> >> > > done via dynamic broker config)." Does that mean the broker
> >> registration
> >> > ZK
> >> > > node will be updated dynamically when this happens?
> >> >
> >> > (Kowshik): Not really. The text was just conveying that a broker could
> >> > "know" of
> >> > a new feature version, but it does not mean the broker should have also
> >> > activated the effects of the feature version. Knowing vs activation are
> >> 2
> >> > separate things,
> >> > and the latter can be achieved by dynamic config. I have reworded the
> >> text
> >> > to
> >> > make this clear to the reader.
> >> >
> >> >
> >> > > 104. UpdateMetadataRequest
> >> > > 104.1 It would be useful to describe when the feature metadata is
> >> > included
> >> > > in the request. My understanding is that it's only included if (1)
> >> there
> >> > is
> >> > > a change to the finalized feature; (2) broker restart; (3) controller
> >> > > failover.
> >> > > 104.2 The new fields have the following versions. Why are the
> >> versions 3+
> >> > > when the top version is bumped to 6?
> >> > >       "fields":  [
> >> > >         {"name": "Name", "type":  "string", "versions":  "3+",
> >> > >           "about": "The name of the feature."},
> >> > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> >> > >           "about": "The finalized version for the feature."}
> >> > >       ]
> >> >
> >> > (Kowshik): With the new improved design, we have completely eliminated
> >> the
> >> > need to
> >> > use UpdateMetadataRequest. This is because we now rely on ZK to deliver
> >> the
> >> > notifications for changes to the '/features' ZK node.
> >> >
> >> > > 105. kafka-features.sh: Instead of using update/delete, perhaps it's
> >> > better
> >> > > to use enable/disable?
> >> >
> >> > (Kowshik): For delete, yes, I have changed it so that we instead call it
> >> > 'disable'.
> >> > However for 'update', it can now also refer to either an upgrade or a
> >> > forced downgrade.
> >> > Therefore, I have left it the way it is, just calling it as just
> >> 'update'.
> >> >
> >> >
> >> > Cheers,
> >> > Kowshik
> >> >
> >> > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <j...@confluent.io> wrote:
> >> >
> >> > > Hi, Kowshik,
> >> > >
> >> > > Thanks for the KIP. Looks good overall. A few comments below.
> >> > >
> >> > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> >> > > 100.1 Since this request waits for responses from brokers, should we
> >> add
> >> > a
> >> > > timeout in the request (like createTopicRequest)?
> >> > > 100.2 The response schema is a bit weird. Typically, the response just
> >> > > shows an error code and an error message, instead of echoing the
> >> request.
> >> > > 100.3 Should we add a separate request to list/describe the existing
> >> > > features?
> >> > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
> >> > > DELETE, the version field doesn't make sense. So, I guess the broker
> >> just
> >> > > ignores this? An alternative way is to have a separate
> >> > > DeleteFeaturesRequest
> >> > > 100.5 In UpdateFeaturesResponse, we have "The monotonically increasing
> >> > > version of the metadata for finalized features." I am wondering why
> >> the
> >> > > ordering is important?
> >> > > 100.6 Could you specify the required ACL for this new request?
> >> > >
> >> > > 101. For the broker registration ZK node, should we bump up the
> >> version
> >> > in
> >> > > the json?
> >> > >
> >> > > 102. For the /features ZK node, not sure if we need the epoch field.
> >> Each
> >> > > ZK node has an internal version field that is incremented on every
> >> > update.
> >> > >
> >> > > 103. "Enabling the actual semantics of a feature version cluster-wide
> >> is
> >> > > left to the discretion of the logic implementing the feature (ex: can
> >> be
> >> > > done via dynamic broker config)." Does that mean the broker
> >> registration
> >> > ZK
> >> > > node will be updated dynamically when this happens?
> >> > >
> >> > > 104. UpdateMetadataRequest
> >> > > 104.1 It would be useful to describe when the feature metadata is
> >> > included
> >> > > in the request. My understanding is that it's only included if (1)
> >> there
> >> > is
> >> > > a change to the finalized feature; (2) broker restart; (3) controller
> >> > > failover.
> >> > > 104.2 The new fields have the following versions. Why are the
> >> versions 3+
> >> > > when the top version is bumped to 6?
> >> > >       "fields":  [
> >> > >         {"name": "Name", "type":  "string", "versions":  "3+",
> >> > >           "about": "The name of the feature."},
> >> > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> >> > >           "about": "The finalized version for the feature."}
> >> > >       ]
> >> > >
> >> > > 105. kafka-features.sh: Instead of using update/delete, perhaps it's
> >> > better
> >> > > to use enable/disable?
> >> > >
> >> > > Jun
> >> > >
> >> > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <
> >> kpraka...@confluent.io
> >> > >
> >> > > wrote:
> >> > >
> >> > > > Hey Boyang,
> >> > > >
> >> > > > Thanks for the great feedback! I have updated the KIP based on your
> >> > > > feedback.
> >> > > > Please find my response below for your comments, look for sentences
> >> > > > starting
> >> > > > with "(Kowshik)" below.
> >> > > >
> >> > > >
> >> > > > > 1. "When is it safe for the brokers to begin handling EOS traffic"
> >> > > could
> >> > > > be
> >> > > > > converted as "When is it safe for the brokers to start serving new
> >> > > > > Exactly-Once(EOS) semantics" since EOS is not explained earlier in
> >> > the
> >> > > > > context.
> >> > > >
> >> > > > (Kowshik): Great point! Done.
> >> > > >
> >> > > > > 2. In the *Explanation *section, the metadata version number part
> >> > > seems a
> >> > > > > bit blurred. Could you point a reference to later section that we
> >> > going
> >> > > > to
> >> > > > > store it in Zookeeper and update it every time when there is a
> >> > feature
> >> > > > > change?
> >> > > >
> >> > > > (Kowshik): Great point! Done. I've added a reference in the KIP.
> >> > > >
> >> > > >
> >> > > > > 3. For the feature downgrade, although it's a Non-goal of the KIP,
> >> > for
> >> > > > > features such as group coordinator semantics, there is no legal
> >> > > scenario
> >> > > > to
> >> > > > > perform a downgrade at all. So having downgrade door open is
> >> pretty
> >> > > > > error-prone as human faults happen all the time. I'm assuming as
> >> new
> >> > > > > features are implemented, it's not very hard to add a flag during
> >> > > feature
> >> > > > > creation to indicate whether this feature is "downgradable". Could
> >> > you
> >> > > > > explain a bit more on the extra engineering effort for shipping
> >> this
> >> > > KIP
> >> > > > > with downgrade protection in place?
> >> > > >
> >> > > > (Kowshik): Great point! I'd agree and disagree here. While I agree
> >> that
> >> > > > accidental
> >> > > > downgrades can cause problems, I also think sometimes downgrades
> >> should
> >> > > > be allowed for emergency reasons (not all downgrades cause issues).
> >> > > > It is just subjective to the feature being downgraded.
> >> > > >
> >> > > > To be more strict about feature version downgrades, I have modified
> >> the
> >> > > KIP
> >> > > > proposing that we mandate a `--force-downgrade` flag be used in the
> >> > > > UPDATE_FEATURES api
> >> > > > and the tooling, whenever the human is downgrading a finalized
> >> feature
> >> > > > version.
> >> > > > Hopefully this should cover the requirement, until we find the need
> >> for
> >> > > > advanced downgrade support.
> >> > > >
> >> > > > > 4. "Each broker’s supported dictionary of feature versions will be
> >> > > > defined
> >> > > > > in the broker code." So this means in order to restrict a certain
> >> > > > feature,
> >> > > > > we need to start the broker first and then send a feature gating
> >> > > request
> >> > > > > immediately, which introduces a time gap and the intended-to-close
> >> > > > feature
> >> > > > > could actually serve request during this phase. Do you think we
> >> > should
> >> > > > also
> >> > > > > support configurations as well so that admin user could freely
> >> roll
> >> > up
> >> > > a
> >> > > > > cluster with all nodes complying the same feature gating, without
> >> > > > worrying
> >> > > > > about the turnaround time to propagate the message only after the
> >> > > cluster
> >> > > > > starts up?
> >> > > >
> >> > > > (Kowshik): This is a great point/question. One of the expectations
> >> out
> >> > of
> >> > > > this KIP, which is
> >> > > > already followed in the broker, is the following.
> >> > > >  - Imagine at time T1 the broker starts up and registers it’s
> >> presence
> >> > in
> >> > > > ZK,
> >> > > >    along with advertising it’s supported features.
> >> > > >  - Imagine at a future time T2 the broker receives the
> >> > > > UpdateMetadataRequest
> >> > > >    from the controller, which contains the latest finalized
> >> features as
> >> > > > seen by
> >> > > >    the controller. The broker validates this data against it’s
> >> > supported
> >> > > > features to
> >> > > >    make sure there is no mismatch (it will shutdown if there is an
> >> > > > incompatibility).
> >> > > >
> >> > > > It is expected that during the time between the 2 events T1 and T2,
> >> the
> >> > > > broker is
> >> > > > almost a silent entity in the cluster. It does not add any value to
> >> the
> >> > > > cluster, or carry
> >> > > > out any important broker activities. By “important”, I mean it is
> >> not
> >> > > doing
> >> > > > mutations
> >> > > > on it’s persistence, not mutating critical in-memory state, won’t be
> >> > > > serving
> >> > > > produce/fetch requests. Note it doesn’t even know it’s assigned
> >> > > partitions
> >> > > > until
> >> > > > it receives UpdateMetadataRequest from controller. Anything the
> >> broker
> >> > is
> >> > > > doing up
> >> > > > until this point is not damaging/useful.
> >> > > >
> >> > > > I’ve clarified the above in the KIP, see this new section:
> >> > > >
> >> > > >
> >> > >
> >> >
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> >> > > > .
> >> > > >
> >> > > > > 5. "adding a new Feature, updating or deleting an existing
> >> Feature",
> >> > > may
> >> > > > be
> >> > > > > I misunderstood something, I thought the features are defined in
> >> > broker
> >> > > > > code, so admin could not really create a new feature?
> >> > > >
> >> > > > (Kowshik): Great point! You understood this right. Here adding a
> >> > feature
> >> > > > means we are
> >> > > > adding a cluster-wide finalized *max* version for a feature that was
> >> > > > previously never finalized.
> >> > > > I have clarified this in the KIP now.
> >> > > >
> >> > > > > 6. I think we need a separate error code like
> >> > > FEATURE_UPDATE_IN_PROGRESS
> >> > > > to
> >> > > > > reject a concurrent feature update request.
> >> > > >
> >> > > > (Kowshik): Great point! I have modified the KIP adding the above
> >> (see
> >> > > > 'Tooling support -> Admin API changes').
> >> > > >
> >> > > > > 7. I think we haven't discussed the alternative solution to pass
> >> the
> >> > > > > feature information through Zookeeper. Is that mentioned in the
> >> KIP
> >> > to
> >> > > > > justify why using UpdateMetadata is more favorable?
> >> > > >
> >> > > > (Kowshik): Nice question! The broker reads finalized feature info
> >> > stored
> >> > > in
> >> > > > ZK,
> >> > > > only during startup when it does a validation. When serving
> >> > > > `ApiVersionsRequest`, the
> >> > > > broker does not read this info from ZK directly. I'd imagine the
> >> risk
> >> > is
> >> > > > that it can increase
> >> > > > the ZK read QPS which can be a bottleneck for the system. Today, in
> >> > Kafka
> >> > > > we use the
> >> > > > controller to fan out ZK updates to brokers and we want to stick to
> >> > that
> >> > > > pattern to avoid
> >> > > > the ZK read bottleneck when serving `ApiVersionsRequest`.
> >> > > >
> >> > > > > 8. I was under the impression that user could configure a range of
> >> > > > > supported versions, what's the trade-off for allowing single
> >> > finalized
> >> > > > > version only?
> >> > > >
> >> > > > (Kowshik): Great question! The finalized version of a feature
> >> basically
> >> > > > refers to
> >> > > > the cluster-wide finalized feature "maximum" version. For example,
> >> if
> >> > the
> >> > > > 'group_coordinator' feature
> >> > > > has the finalized version set to 10, then, it means that
> >> cluster-wide
> >> > all
> >> > > > versions upto v10 are
> >> > > > supported for this feature. However, note that if some version (ex:
> >> v0)
> >> > > > gets deprecated
> >> > > > for this feature, then we don’t convey that using this scheme (also
> >> > > > supporting deprecation is a non-goal).
> >> > > >
> >> > > > (Kowshik): I’ve now modified the KIP at all points, refering to
> >> > finalized
> >> > > > feature "maximum" versions.
> >> > > >
> >> > > > > 9. One minor syntax fix: Note that here the "client" here may be a
> >> > > > producer
> >> > > >
> >> > > > (Kowshik): Great point! Done.
> >> > > >
> >> > > >
> >> > > > Cheers,
> >> > > > Kowshik
> >> > > >
> >> > > >
> >> > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <
> >> > reluctanthero...@gmail.com>
> >> > > > wrote:
> >> > > >
> >> > > > > Hey Kowshik,
> >> > > > >
> >> > > > > thanks for the revised KIP. Got a couple of questions:
> >> > > > >
> >> > > > > 1. "When is it safe for the brokers to begin handling EOS traffic"
> >> > > could
> >> > > > be
> >> > > > > converted as "When is it safe for the brokers to start serving new
> >> > > > > Exactly-Once(EOS) semantics" since EOS is not explained earlier in
> >> > the
> >> > > > > context.
> >> > > > >
> >> > > > > 2. In the *Explanation *section, the metadata version number part
> >> > > seems a
> >> > > > > bit blurred. Could you point a reference to later section that we
> >> > going
> >> > > > to
> >> > > > > store it in Zookeeper and update it every time when there is a
> >> > feature
> >> > > > > change?
> >> > > > >
> >> > > > > 3. For the feature downgrade, although it's a Non-goal of the KIP,
> >> > for
> >> > > > > features such as group coordinator semantics, there is no legal
> >> > > scenario
> >> > > > to
> >> > > > > perform a downgrade at all. So having downgrade door open is
> >> pretty
> >> > > > > error-prone as human faults happen all the time. I'm assuming as
> >> new
> >> > > > > features are implemented, it's not very hard to add a flag during
> >> > > feature
> >> > > > > creation to indicate whether this feature is "downgradable". Could
> >> > you
> >> > > > > explain a bit more on the extra engineering effort for shipping
> >> this
> >> > > KIP
> >> > > > > with downgrade protection in place?
> >> > > > >
> >> > > > > 4. "Each broker’s supported dictionary of feature versions will be
> >> > > > defined
> >> > > > > in the broker code." So this means in order to restrict a certain
> >> > > > feature,
> >> > > > > we need to start the broker first and then send a feature gating
> >> > > request
> >> > > > > immediately, which introduces a time gap and the intended-to-close
> >> > > > feature
> >> > > > > could actually serve request during this phase. Do you think we
> >> > should
> >> > > > also
> >> > > > > support configurations as well so that admin user could freely
> >> roll
> >> > up
> >> > > a
> >> > > > > cluster with all nodes complying the same feature gating, without
> >> > > > worrying
> >> > > > > about the turnaround time to propagate the message only after the
> >> > > cluster
> >> > > > > starts up?
> >> > > > >
> >> > > > > 5. "adding a new Feature, updating or deleting an existing
> >> Feature",
> >> > > may
> >> > > > be
> >> > > > > I misunderstood something, I thought the features are defined in
> >> > broker
> >> > > > > code, so admin could not really create a new feature?
> >> > > > >
> >> > > > > 6. I think we need a separate error code like
> >> > > FEATURE_UPDATE_IN_PROGRESS
> >> > > > to
> >> > > > > reject a concurrent feature update request.
> >> > > > >
> >> > > > > 7. I think we haven't discussed the alternative solution to pass
> >> the
> >> > > > > feature information through Zookeeper. Is that mentioned in the
> >> KIP
> >> > to
> >> > > > > justify why using UpdateMetadata is more favorable?
> >> > > > >
> >> > > > > 8. I was under the impression that user could configure a range of
> >> > > > > supported versions, what's the trade-off for allowing single
> >> > finalized
> >> > > > > version only?
> >> > > > >
> >> > > > > 9. One minor syntax fix: Note that here the "client" here may be a
> >> > > > producer
> >> > > > >
> >> > > > > Boyang
> >> > > > >
> >> > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <cmcc...@apache.org>
> >> > > wrote:
> >> > > > >
> >> > > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam wrote:
> >> > > > > > > Hi Colin,
> >> > > > > > >
> >> > > > > > > Thanks for the feedback! I've changed the KIP to address your
> >> > > > > > > suggestions.
> >> > > > > > > Please find below my explanation. Here is a link to KIP 584:
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> >> > > > > > > .
> >> > > > > > >
> >> > > > > > > 1. '__data_version__' is the version of the finalized feature
> >> > > > metadata
> >> > > > > > > (i.e. actual ZK node contents), while the
> >> '__schema_version__' is
> >> > > the
> >> > > > > > > version of the schema of the data persisted in ZK. These serve
> >> > > > > different
> >> > > > > > > purposes. '__data_version__' is is useful mainly to clients
> >> > during
> >> > > > > reads,
> >> > > > > > > to differentiate between the 2 versions of eventually
> >> consistent
> >> > > > > > 'finalized
> >> > > > > > > features' metadata (i.e. larger metadata version is more
> >> recent).
> >> > > > > > > '__schema_version__' provides an additional degree of
> >> > flexibility,
> >> > > > > where
> >> > > > > > if
> >> > > > > > > we decide to change the schema for '/features' node in ZK (in
> >> the
> >> > > > > > future),
> >> > > > > > > then we can manage broker roll outs suitably (i.e.
> >> > > > > > > serialization/deserialization of the ZK data can be handled
> >> > > safely).
> >> > > > > >
> >> > > > > > Hi Kowshik,
> >> > > > > >
> >> > > > > > If you're talking about a number that lets you know if data is
> >> more
> >> > > or
> >> > > > > > less recent, we would typically call that an epoch, and not a
> >> > > version.
> >> > > > > For
> >> > > > > > the ZK data structures, the word "version" is typically reserved
> >> > for
> >> > > > > > describing changes to the overall schema of the data that is
> >> > written
> >> > > to
> >> > > > > > ZooKeeper.  We don't even really change the "version" of those
> >> > > schemas
> >> > > > > that
> >> > > > > > much, since most changes are backwards-compatible.  But we do
> >> > include
> >> > > > > that
> >> > > > > > version field just in case.
> >> > > > > >
> >> > > > > > I don't think we really need an epoch here, though, since we can
> >> > just
> >> > > > > look
> >> > > > > > at the broker epoch.  Whenever the broker registers, its epoch
> >> will
> >> > > be
> >> > > > > > greater than the previous broker epoch.  And the newly
> >> registered
> >> > > data
> >> > > > > will
> >> > > > > > take priority.  This will be a lot simpler than adding a
> >> separate
> >> > > epoch
> >> > > > > > system, I think.
> >> > > > > >
> >> > > > > > >
> >> > > > > > > 2. Regarding admin client needing min and max information -
> >> you
> >> > are
> >> > > > > > right!
> >> > > > > > > I've changed the KIP such that the Admin API also allows the
> >> user
> >> > > to
> >> > > > > read
> >> > > > > > > 'supported features' from a specific broker. Please look at
> >> the
> >> > > > section
> >> > > > > > > "Admin API changes".
> >> > > > > >
> >> > > > > > Thanks.
> >> > > > > >
> >> > > > > > >
> >> > > > > > > 3. Regarding the use of `long` vs `Long` - it was not
> >> deliberate.
> >> > > > I've
> >> > > > > > > improved the KIP to just use `long` at all places.
> >> > > > > >
> >> > > > > > Sounds good.
> >> > > > > >
> >> > > > > > >
> >> > > > > > > 4. Regarding kafka.admin.FeatureCommand tool - you are right!
> >> > I've
> >> > > > > > updated
> >> > > > > > > the KIP sketching the functionality provided by this tool,
> >> with
> >> > > some
> >> > > > > > > examples. Please look at the section "Tooling support
> >> examples".
> >> > > > > > >
> >> > > > > > > Thank you!
> >> > > > > >
> >> > > > > >
> >> > > > > > Thanks, Kowshik.
> >> > > > > >
> >> > > > > > cheers,
> >> > > > > > Colin
> >> > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > Cheers,
> >> > > > > > > Kowshik
> >> > > > > > >
> >> > > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <
> >> > cmcc...@apache.org>
> >> > > > > > wrote:
> >> > > > > > >
> >> > > > > > > > Thanks, Kowshik, this looks good.
> >> > > > > > > >
> >> > > > > > > > In the "Schema" section, do we really need both
> >> > > __schema_version__
> >> > > > > and
> >> > > > > > > > __data_version__?  Can we just have a single version field
> >> > here?
> >> > > > > > > >
> >> > > > > > > > Shouldn't the Admin(Client) function have some way to get
> >> the
> >> > min
> >> > > > and
> >> > > > > > max
> >> > > > > > > > information that we're exposing as well?  I guess we could
> >> have
> >> > > > min,
> >> > > > > > max,
> >> > > > > > > > and current.  Unrelated: is the use of Long rather than long
> >> > > > > deliberate
> >> > > > > > > > here?
> >> > > > > > > >
> >> > > > > > > > It would be good to describe how the command line tool
> >> > > > > > > > kafka.admin.FeatureCommand will work.  For example the flags
> >> > that
> >> > > > it
> >> > > > > > will
> >> > > > > > > > take and the output that it will generate to STDOUT.
> >> > > > > > > >
> >> > > > > > > > cheers,
> >> > > > > > > > Colin
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam wrote:
> >> > > > > > > > > Hi all,
> >> > > > > > > > >
> >> > > > > > > > > I've opened KIP-584
> >> > > <https://issues.apache.org/jira/browse/KIP-584> <
> >> > > > https://issues.apache.org/jira/browse/KIP-584
> >> > > > > >
> >> > > > > > > > > which
> >> > > > > > > > > is intended to provide a versioning scheme for features.
> >> I'd
> >> > > like
> >> > > > > to
> >> > > > > > use
> >> > > > > > > > > this thread to discuss the same. I'd appreciate any
> >> feedback
> >> > on
> >> > > > > this.
> >> > > > > > > > > Here
> >> > > > > > > > > is a link to KIP-584
> >> > > <https://issues.apache.org/jira/browse/KIP-584>:
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> >> > > > > > > > >  .
> >> > > > > > > > >
> >> > > > > > > > > Thank you!
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > Cheers,
> >> > > > > > > > > Kowshik
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>

Reply via email to