Thanks Kowshik, the answers are making sense. Some follow-ups:

On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <j...@confluent.io> wrote:

> Hi, Kowshik,
>
> Thanks for the KIP. Looks good overall. A few comments below.
>
> 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> 100.1 Since this request waits for responses from brokers, should we add a
> timeout in the request (like createTopicRequest)?
> 100.2 The response schema is a bit weird. Typically, the response just
> shows an error code and an error message, instead of echoing the request.
> 100.3 Should we add a separate request to list/describe the existing
> features?
> 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
> DELETE, the version field doesn't make sense. So, I guess the broker just
> ignores this? An alternative way is to have a separate
> DeleteFeaturesRequest
> 100.5 In UpdateFeaturesResponse, we have "The monotonically increasing
> version of the metadata for finalized features." I am wondering why the
> ordering is important?
> 100.6 Could you specify the required ACL for this new request?
>
> 101. For the broker registration ZK node, should we bump up the version in
> the json?
>
> 102. For the /features ZK node, not sure if we need the epoch field. Each
> ZK node has an internal version field that is incremented on every update.
>
> 103. "Enabling the actual semantics of a feature version cluster-wide is
> left to the discretion of the logic implementing the feature (ex: can be
> done via dynamic broker config)." Does that mean the broker registration ZK
> node will be updated dynamically when this happens?
>
> 104. UpdateMetadataRequest
> 104.1 It would be useful to describe when the feature metadata is included
> in the request. My understanding is that it's only included if (1) there is
> a change to the finalized feature; (2) broker restart; (3) controller
> failover.
> 104.2 The new fields have the following versions. Why are the versions 3+
> when the top version is bumped to 6?
>       "fields":  [
>         {"name": "Name", "type":  "string", "versions":  "3+",
>           "about": "The name of the feature."},
>         {"name":  "Version", "type":  "int64", "versions":  "3+",
>           "about": "The finalized version for the feature."}
>       ]
>
> 105. kafka-features.sh: Instead of using update/delete, perhaps it's better
> to use enable/disable?
>
> Jun
>
> On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <kpraka...@confluent.io>
> wrote:
>
> > Hey Boyang,
> >
> > Thanks for the great feedback! I have updated the KIP based on your
> > feedback.
> > Please find my response below for your comments, look for sentences
> > starting
> > with "(Kowshik)" below.
> >
> >
> > > 1. "When is it safe for the brokers to begin handling EOS traffic"
> could
> > be
> > > converted as "When is it safe for the brokers to start serving new
> > > Exactly-Once(EOS) semantics" since EOS is not explained earlier in the
> > > context.
> >
> > (Kowshik): Great point! Done.
> >

> > 2. In the *Explanation *section, the metadata version number part seems
> a
> > > bit blurred. Could you point a reference to later section that we going
> > to
> > > store it in Zookeeper and update it every time when there is a feature
> > > change?
> >
> > (Kowshik): Great point! Done. I've added a reference in the KIP.
> >
> >
> > > 3. For the feature downgrade, although it's a Non-goal of the KIP, for
> > > features such as group coordinator semantics, there is no legal
> scenario
> > to
> > > perform a downgrade at all. So having downgrade door open is pretty
> > > error-prone as human faults happen all the time. I'm assuming as new
> > > features are implemented, it's not very hard to add a flag during
> feature
> > > creation to indicate whether this feature is "downgradable". Could you
> > > explain a bit more on the extra engineering effort for shipping this
> KIP
> > > with downgrade protection in place?
> >
> > (Kowshik): Great point! I'd agree and disagree here. While I agree that
> > accidental
> > downgrades can cause problems, I also think sometimes downgrades should
> > be allowed for emergency reasons (not all downgrades cause issues).
> > It is just subjective to the feature being downgraded.
> >
> > To be more strict about feature version downgrades, I have modified the
> KIP
> > proposing that we mandate a `--force-downgrade` flag be used in the
> > UPDATE_FEATURES api
> > and the tooling, whenever the human is downgrading a finalized feature
> > version.
> > Hopefully this should cover the requirement, until we find the need for
> > advanced downgrade support.
> >
>
+1 for adding this flag.

> > > 4. "Each broker’s supported dictionary of feature versions will be
> > defined
> > > in the broker code." So this means in order to restrict a certain
> > feature,
> > > we need to start the broker first and then send a feature gating
> request
> > > immediately, which introduces a time gap and the intended-to-close
> > feature
> > > could actually serve request during this phase. Do you think we should
> > also
> > > support configurations as well so that admin user could freely roll up
> a
> > > cluster with all nodes complying the same feature gating, without
> > worrying
> > > about the turnaround time to propagate the message only after the
> cluster
> > > starts up?
> >
> > (Kowshik): This is a great point/question. One of the expectations out of
> > this KIP, which is
> > already followed in the broker, is the following.
> >  - Imagine at time T1 the broker starts up and registers it’s presence in
> > ZK,
> >    along with advertising it’s supported features.
> >  - Imagine at a future time T2 the broker receives the
> > UpdateMetadataRequest
> >    from the controller, which contains the latest finalized features as
> > seen by
> >    the controller. The broker validates this data against it’s supported
> > features to
> >    make sure there is no mismatch (it will shutdown if there is an
> > incompatibility).
> >
> > It is expected that during the time between the 2 events T1 and T2, the
> > broker is
> > almost a silent entity in the cluster. It does not add any value to the
> > cluster, or carry
> > out any important broker activities. By “important”, I mean it is not
> doing
> > mutations
> > on it’s persistence, not mutating critical in-memory state, won’t be
> > serving
> > produce/fetch requests. Note it doesn’t even know it’s assigned
> partitions
> > until
> > it receives UpdateMetadataRequest from controller. Anything the broker is
> > doing up
> > until this point is not damaging/useful.
> >
> > I’ve clarified the above in the KIP, see this new section:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> > .
>
> My point is that during a bootstrapping stage of a cluster, we could not
pick the desired feature version as no controller is actively handling our
request. But anyway, I think this is a rare case to discuss, and the added
paragraph looks good :)


> > > 5. "adding a new Feature, updating or deleting an existing Feature",
> may
> > be
> > > I misunderstood something, I thought the features are defined in broker
> > > code, so admin could not really create a new feature?
> >
> > (Kowshik): Great point! You understood this right. Here adding a feature
> > means we are
> > adding a cluster-wide finalized *max* version for a feature that was
> > previously never finalized.
> > I have clarified this in the KIP now.
> >
> > > 6. I think we need a separate error code like
> FEATURE_UPDATE_IN_PROGRESS
> > to
> > > reject a concurrent feature update request.
> >
> > (Kowshik): Great point! I have modified the KIP adding the above (see
> > 'Tooling support -> Admin API changes').
> >
> > > 7. I think we haven't discussed the alternative solution to pass the
> > > feature information through Zookeeper. Is that mentioned in the KIP to
> > > justify why using UpdateMetadata is more favorable?
> >
> > (Kowshik): Nice question! The broker reads finalized feature info stored
> in
> > ZK,
> > only during startup when it does a validation. When serving
> > `ApiVersionsRequest`, the
> > broker does not read this info from ZK directly. I'd imagine the risk is
> > that it can increase
> > the ZK read QPS which can be a bottleneck for the system. Today, in Kafka
> > we use the
> > controller to fan out ZK updates to brokers and we want to stick to that
> > pattern to avoid
> > the ZK read bottleneck when serving `ApiVersionsRequest`.
>
> Feature changes should be roughly the same frequency as config changes.
Today, the dynamic configuration
changes are propagated via Zookeeper. So I guess propagating through
UpdateMetadata doesn't get us more benefits,
while going through ZK notification should be a simpler solution.

> > 8. I was under the impression that user could configure a range of
> > > supported versions, what's the trade-off for allowing single finalized
> > > version only?
> >
> > (Kowshik): Great question! The finalized version of a feature basically
> > refers to
> > the cluster-wide finalized feature "maximum" version. For example, if the
> > 'group_coordinator' feature
> > has the finalized version set to 10, then, it means that cluster-wide all
> > versions upto v10 are
> > supported for this feature. However, note that if some version (ex: v0)
> > gets deprecated
> > for this feature, then we don’t convey that using this scheme (also
> > supporting deprecation is a non-goal).
> >
> > (Kowshik): I’ve now modified the KIP at all points, refering to finalized
> > feature "maximum" versions.
> >
>
Understood, I don't feel strong about deprecation, but does the current KIP
keep the door open for future improvements if
someone has a need for feature deprecation? Could we briefly discuss about
it in the future work section?


> > > 9. One minor syntax fix: Note that here the "client" here may be a
> > producer
> >
> > (Kowshik): Great point! Done.
> >
> >
> > Cheers,
> > Kowshik
> >
> >
> > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <reluctanthero...@gmail.com>
> > wrote:
> >
> > > Hey Kowshik,
> > >
> > > thanks for the revised KIP. Got a couple of questions:
> > >
> > > 1. "When is it safe for the brokers to begin handling EOS traffic"
> could
> > be
> > > converted as "When is it safe for the brokers to start serving new
> > > Exactly-Once(EOS) semantics" since EOS is not explained earlier in the
> > > context.
> > >
> > > 2. In the *Explanation *section, the metadata version number part
> seems a
> > > bit blurred. Could you point a reference to later section that we going
> > to
> > > store it in Zookeeper and update it every time when there is a feature
> > > change?
> > >
> > > 3. For the feature downgrade, although it's a Non-goal of the KIP, for
> > > features such as group coordinator semantics, there is no legal
> scenario
> > to
> > > perform a downgrade at all. So having downgrade door open is pretty
> > > error-prone as human faults happen all the time. I'm assuming as new
> > > features are implemented, it's not very hard to add a flag during
> feature
> > > creation to indicate whether this feature is "downgradable". Could you
> > > explain a bit more on the extra engineering effort for shipping this
> KIP
> > > with downgrade protection in place?
> > >
> > > 4. "Each broker’s supported dictionary of feature versions will be
> > defined
> > > in the broker code." So this means in order to restrict a certain
> > feature,
> > > we need to start the broker first and then send a feature gating
> request
> > > immediately, which introduces a time gap and the intended-to-close
> > feature
> > > could actually serve request during this phase. Do you think we should
> > also
> > > support configurations as well so that admin user could freely roll up
> a
> > > cluster with all nodes complying the same feature gating, without
> > worrying
> > > about the turnaround time to propagate the message only after the
> cluster
> > > starts up?
> > >
> > > 5. "adding a new Feature, updating or deleting an existing Feature",
> may
> > be
> > > I misunderstood something, I thought the features are defined in broker
> > > code, so admin could not really create a new feature?
> > >
> > > 6. I think we need a separate error code like
> FEATURE_UPDATE_IN_PROGRESS
> > to
> > > reject a concurrent feature update request.
> > >
> > > 7. I think we haven't discussed the alternative solution to pass the
> > > feature information through Zookeeper. Is that mentioned in the KIP to
> > > justify why using UpdateMetadata is more favorable?
> > >
> > > 8. I was under the impression that user could configure a range of
> > > supported versions, what's the trade-off for allowing single finalized
> > > version only?
> > >
> > > 9. One minor syntax fix: Note that here the "client" here may be a
> > producer
> > >
> > > Boyang
> > >
> > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <cmcc...@apache.org>
> wrote:
> > >
> > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam wrote:
> > > > > Hi Colin,
> > > > >
> > > > > Thanks for the feedback! I've changed the KIP to address your
> > > > > suggestions.
> > > > > Please find below my explanation. Here is a link to KIP 584:
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > .
> > > > >
> > > > > 1. '__data_version__' is the version of the finalized feature
> > metadata
> > > > > (i.e. actual ZK node contents), while the '__schema_version__' is
> the
> > > > > version of the schema of the data persisted in ZK. These serve
> > > different
> > > > > purposes. '__data_version__' is is useful mainly to clients during
> > > reads,
> > > > > to differentiate between the 2 versions of eventually consistent
> > > > 'finalized
> > > > > features' metadata (i.e. larger metadata version is more recent).
> > > > > '__schema_version__' provides an additional degree of flexibility,
> > > where
> > > > if
> > > > > we decide to change the schema for '/features' node in ZK (in the
> > > > future),
> > > > > then we can manage broker roll outs suitably (i.e.
> > > > > serialization/deserialization of the ZK data can be handled
> safely).
> > > >
> > > > Hi Kowshik,
> > > >
> > > > If you're talking about a number that lets you know if data is more
> or
> > > > less recent, we would typically call that an epoch, and not a
> version.
> > > For
> > > > the ZK data structures, the word "version" is typically reserved for
> > > > describing changes to the overall schema of the data that is written
> to
> > > > ZooKeeper.  We don't even really change the "version" of those
> schemas
> > > that
> > > > much, since most changes are backwards-compatible.  But we do include
> > > that
> > > > version field just in case.
> > > >
> > > > I don't think we really need an epoch here, though, since we can just
> > > look
> > > > at the broker epoch.  Whenever the broker registers, its epoch will
> be
> > > > greater than the previous broker epoch.  And the newly registered
> data
> > > will
> > > > take priority.  This will be a lot simpler than adding a separate
> epoch
> > > > system, I think.
> > > >
> > > > >
> > > > > 2. Regarding admin client needing min and max information - you are
> > > > right!
> > > > > I've changed the KIP such that the Admin API also allows the user
> to
> > > read
> > > > > 'supported features' from a specific broker. Please look at the
> > section
> > > > > "Admin API changes".
> > > >
> > > > Thanks.
> > > >
> > > > >
> > > > > 3. Regarding the use of `long` vs `Long` - it was not deliberate.
> > I've
> > > > > improved the KIP to just use `long` at all places.
> > > >
> > > > Sounds good.
> > > >
> > > > >
> > > > > 4. Regarding kafka.admin.FeatureCommand tool - you are right! I've
> > > > updated
> > > > > the KIP sketching the functionality provided by this tool, with
> some
> > > > > examples. Please look at the section "Tooling support examples".
> > > > >
> > > > > Thank you!
> > > >
> > > >
> > > > Thanks, Kowshik.
> > > >
> > > > cheers,
> > > > Colin
> > > >
> > > > >
> > > > >
> > > > > Cheers,
> > > > > Kowshik
> > > > >
> > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <cmcc...@apache.org>
> > > > wrote:
> > > > >
> > > > > > Thanks, Kowshik, this looks good.
> > > > > >
> > > > > > In the "Schema" section, do we really need both
> __schema_version__
> > > and
> > > > > > __data_version__?  Can we just have a single version field here?
> > > > > >
> > > > > > Shouldn't the Admin(Client) function have some way to get the min
> > and
> > > > max
> > > > > > information that we're exposing as well?  I guess we could have
> > min,
> > > > max,
> > > > > > and current.  Unrelated: is the use of Long rather than long
> > > deliberate
> > > > > > here?
> > > > > >
> > > > > > It would be good to describe how the command line tool
> > > > > > kafka.admin.FeatureCommand will work.  For example the flags that
> > it
> > > > will
> > > > > > take and the output that it will generate to STDOUT.
> > > > > >
> > > > > > cheers,
> > > > > > Colin
> > > > > >
> > > > > >
> > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam wrote:
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I've opened KIP-584 <
> > https://issues.apache.org/jira/browse/KIP-584
> > > >
> > > > > > > which
> > > > > > > is intended to provide a versioning scheme for features. I'd
> like
> > > to
> > > > use
> > > > > > > this thread to discuss the same. I'd appreciate any feedback on
> > > this.
> > > > > > > Here
> > > > > > > is a link to KIP-584:
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > >  .
> > > > > > >
> > > > > > > Thank you!
> > > > > > >
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Kowshik
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to