Hi all,

Any other feedback on this KIP before we start the vote?


Cheers,
Kowshik

On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam <kpraka...@confluent.io>
wrote:

> Hey Jun,
>
> Thanks a lot for the great feedback! Please note that the design
> has changed a little bit on the KIP, and we now propagate the finalized
> features metadata only via ZK watches (instead of UpdateMetadataRequest
> from the controller).
>
> Please find below my response to your questions/feedback, with the prefix
> "(Kowshik):".
>
> > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > 100.1 Since this request waits for responses from brokers, should we add
> a
> > timeout in the request (like createTopicRequest)?
>
> (Kowshik): Great point! Done. I have added a timeout field. Note: we no
> longer
> wait for responses from brokers, since the design has been changed so that
> the
> features information is propagated via ZK. Nevertheless, it is right to
> have a timeout
> for the request.
>
> > 100.2 The response schema is a bit weird. Typically, the response just
> > shows an error code and an error message, instead of echoing the request.
>
> (Kowshik): Great point! Yeah, I have modified it to just return an error
> code and a message.
> Previously it was not echoing the "request", rather it was returning the
> latest set of
> cluster-wide finalized features (after applying the updates). But you are
> right,
> the additional info is not required, so I have removed it from the
> response schema.
>
> > 100.3 Should we add a separate request to list/describe the existing
> > features?
>
> (Kowshik): This is already present in the KIP via the 'DescribeFeatures'
> Admin API,
> which, underneath covers uses the ApiVersionsRequest to list/describe the
> existing features. Please read the 'Tooling support' section.
>
> > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
> > DELETE, the version field doesn't make sense. So, I guess the broker just
> > ignores this? An alternative way is to have a separate
> DeleteFeaturesRequest
>
> (Kowshik): Great point! I have modified the KIP now to have 2 separate
> controller APIs
> serving these different purposes:
> 1. updateFeatures
> 2. deleteFeatures
>
> > 100.5 In UpdateFeaturesResponse, we have "The monotonically increasing
> > version of the metadata for finalized features." I am wondering why the
> > ordering is important?
>
> (Kowshik): In the latest KIP write-up, it is called epoch (instead of
> version), and
> it is just the ZK node version. Basically, this is the epoch for the
> cluster-wide
> finalized feature version metadata. This metadata is served to clients via
> the
> ApiVersionsResponse (for reads). We propagate updates from the '/features'
> ZK node
> to all brokers, via ZK watches setup by each broker on the '/features'
> node.
>
> Now here is why the ordering is important:
> ZK watches don't propagate at the same time. As a result, the
> ApiVersionsResponse
> is eventually consistent across brokers. This can introduce cases
> where clients see an older lower epoch of the features metadata, after a
> more recent
> higher epoch was returned at a previous point in time. We expect clients
> to always employ the rule that the latest received higher epoch of metadata
> always trumps an older smaller epoch. Those clients that are external to
> Kafka should strongly consider discovering the latest metadata once during
> startup from the brokers, and if required refresh the metadata periodically
> (to get the latest metadata).
>
> > 100.6 Could you specify the required ACL for this new request?
>
> (Kowshik): What is ACL, and how could I find out which one to specify?
> Please could you provide me some pointers? I'll be glad to update the
> KIP once I know the next steps.
>
> > 101. For the broker registration ZK node, should we bump up the version
> in
> the json?
>
> (Kowshik): Great point! Done. I've increased the version in the broker
> json by 1.
>
> > 102. For the /features ZK node, not sure if we need the epoch field. Each
> > ZK node has an internal version field that is incremented on every
> update.
>
> (Kowshik): Great point! Done. I'm using the ZK node version now, instead
> of explicitly
> incremented epoch.
>
> > 103. "Enabling the actual semantics of a feature version cluster-wide is
> > left to the discretion of the logic implementing the feature (ex: can be
> > done via dynamic broker config)." Does that mean the broker registration
> ZK
> > node will be updated dynamically when this happens?
>
> (Kowshik): Not really. The text was just conveying that a broker could
> "know" of
> a new feature version, but it does not mean the broker should have also
> activated the effects of the feature version. Knowing vs activation are 2
> separate things,
> and the latter can be achieved by dynamic config. I have reworded the text
> to
> make this clear to the reader.
>
>
> > 104. UpdateMetadataRequest
> > 104.1 It would be useful to describe when the feature metadata is
> included
> > in the request. My understanding is that it's only included if (1) there
> is
> > a change to the finalized feature; (2) broker restart; (3) controller
> > failover.
> > 104.2 The new fields have the following versions. Why are the versions 3+
> > when the top version is bumped to 6?
> >       "fields":  [
> >         {"name": "Name", "type":  "string", "versions":  "3+",
> >           "about": "The name of the feature."},
> >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> >           "about": "The finalized version for the feature."}
> >       ]
>
> (Kowshik): With the new improved design, we have completely eliminated the
> need to
> use UpdateMetadataRequest. This is because we now rely on ZK to deliver the
> notifications for changes to the '/features' ZK node.
>
> > 105. kafka-features.sh: Instead of using update/delete, perhaps it's
> better
> > to use enable/disable?
>
> (Kowshik): For delete, yes, I have changed it so that we instead call it
> 'disable'.
> However for 'update', it can now also refer to either an upgrade or a
> forced downgrade.
> Therefore, I have left it the way it is, just calling it as just 'update'.
>
>
> Cheers,
> Kowshik
>
> On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <j...@confluent.io> wrote:
>
>> Hi, Kowshik,
>>
>> Thanks for the KIP. Looks good overall. A few comments below.
>>
>> 100. UpdateFeaturesRequest/UpdateFeaturesResponse
>> 100.1 Since this request waits for responses from brokers, should we add a
>> timeout in the request (like createTopicRequest)?
>> 100.2 The response schema is a bit weird. Typically, the response just
>> shows an error code and an error message, instead of echoing the request.
>> 100.3 Should we add a separate request to list/describe the existing
>> features?
>> 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single request. For
>> DELETE, the version field doesn't make sense. So, I guess the broker just
>> ignores this? An alternative way is to have a separate
>> DeleteFeaturesRequest
>> 100.5 In UpdateFeaturesResponse, we have "The monotonically increasing
>> version of the metadata for finalized features." I am wondering why the
>> ordering is important?
>> 100.6 Could you specify the required ACL for this new request?
>>
>> 101. For the broker registration ZK node, should we bump up the version in
>> the json?
>>
>> 102. For the /features ZK node, not sure if we need the epoch field. Each
>> ZK node has an internal version field that is incremented on every update.
>>
>> 103. "Enabling the actual semantics of a feature version cluster-wide is
>> left to the discretion of the logic implementing the feature (ex: can be
>> done via dynamic broker config)." Does that mean the broker registration
>> ZK
>> node will be updated dynamically when this happens?
>>
>> 104. UpdateMetadataRequest
>> 104.1 It would be useful to describe when the feature metadata is included
>> in the request. My understanding is that it's only included if (1) there
>> is
>> a change to the finalized feature; (2) broker restart; (3) controller
>> failover.
>> 104.2 The new fields have the following versions. Why are the versions 3+
>> when the top version is bumped to 6?
>>       "fields":  [
>>         {"name": "Name", "type":  "string", "versions":  "3+",
>>           "about": "The name of the feature."},
>>         {"name":  "Version", "type":  "int64", "versions":  "3+",
>>           "about": "The finalized version for the feature."}
>>       ]
>>
>> 105. kafka-features.sh: Instead of using update/delete, perhaps it's
>> better
>> to use enable/disable?
>>
>> Jun
>>
>> On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <kpraka...@confluent.io>
>> wrote:
>>
>> > Hey Boyang,
>> >
>> > Thanks for the great feedback! I have updated the KIP based on your
>> > feedback.
>> > Please find my response below for your comments, look for sentences
>> > starting
>> > with "(Kowshik)" below.
>> >
>> >
>> > > 1. "When is it safe for the brokers to begin handling EOS traffic"
>> could
>> > be
>> > > converted as "When is it safe for the brokers to start serving new
>> > > Exactly-Once(EOS) semantics" since EOS is not explained earlier in the
>> > > context.
>> >
>> > (Kowshik): Great point! Done.
>> >
>> > > 2. In the *Explanation *section, the metadata version number part
>> seems a
>> > > bit blurred. Could you point a reference to later section that we
>> going
>> > to
>> > > store it in Zookeeper and update it every time when there is a feature
>> > > change?
>> >
>> > (Kowshik): Great point! Done. I've added a reference in the KIP.
>> >
>> >
>> > > 3. For the feature downgrade, although it's a Non-goal of the KIP, for
>> > > features such as group coordinator semantics, there is no legal
>> scenario
>> > to
>> > > perform a downgrade at all. So having downgrade door open is pretty
>> > > error-prone as human faults happen all the time. I'm assuming as new
>> > > features are implemented, it's not very hard to add a flag during
>> feature
>> > > creation to indicate whether this feature is "downgradable". Could you
>> > > explain a bit more on the extra engineering effort for shipping this
>> KIP
>> > > with downgrade protection in place?
>> >
>> > (Kowshik): Great point! I'd agree and disagree here. While I agree that
>> > accidental
>> > downgrades can cause problems, I also think sometimes downgrades should
>> > be allowed for emergency reasons (not all downgrades cause issues).
>> > It is just subjective to the feature being downgraded.
>> >
>> > To be more strict about feature version downgrades, I have modified the
>> KIP
>> > proposing that we mandate a `--force-downgrade` flag be used in the
>> > UPDATE_FEATURES api
>> > and the tooling, whenever the human is downgrading a finalized feature
>> > version.
>> > Hopefully this should cover the requirement, until we find the need for
>> > advanced downgrade support.
>> >
>> > > 4. "Each broker’s supported dictionary of feature versions will be
>> > defined
>> > > in the broker code." So this means in order to restrict a certain
>> > feature,
>> > > we need to start the broker first and then send a feature gating
>> request
>> > > immediately, which introduces a time gap and the intended-to-close
>> > feature
>> > > could actually serve request during this phase. Do you think we should
>> > also
>> > > support configurations as well so that admin user could freely roll
>> up a
>> > > cluster with all nodes complying the same feature gating, without
>> > worrying
>> > > about the turnaround time to propagate the message only after the
>> cluster
>> > > starts up?
>> >
>> > (Kowshik): This is a great point/question. One of the expectations out
>> of
>> > this KIP, which is
>> > already followed in the broker, is the following.
>> >  - Imagine at time T1 the broker starts up and registers it’s presence
>> in
>> > ZK,
>> >    along with advertising it’s supported features.
>> >  - Imagine at a future time T2 the broker receives the
>> > UpdateMetadataRequest
>> >    from the controller, which contains the latest finalized features as
>> > seen by
>> >    the controller. The broker validates this data against it’s supported
>> > features to
>> >    make sure there is no mismatch (it will shutdown if there is an
>> > incompatibility).
>> >
>> > It is expected that during the time between the 2 events T1 and T2, the
>> > broker is
>> > almost a silent entity in the cluster. It does not add any value to the
>> > cluster, or carry
>> > out any important broker activities. By “important”, I mean it is not
>> doing
>> > mutations
>> > on it’s persistence, not mutating critical in-memory state, won’t be
>> > serving
>> > produce/fetch requests. Note it doesn’t even know it’s assigned
>> partitions
>> > until
>> > it receives UpdateMetadataRequest from controller. Anything the broker
>> is
>> > doing up
>> > until this point is not damaging/useful.
>> >
>> > I’ve clarified the above in the KIP, see this new section:
>> >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
>> > .
>> >
>> > > 5. "adding a new Feature, updating or deleting an existing Feature",
>> may
>> > be
>> > > I misunderstood something, I thought the features are defined in
>> broker
>> > > code, so admin could not really create a new feature?
>> >
>> > (Kowshik): Great point! You understood this right. Here adding a feature
>> > means we are
>> > adding a cluster-wide finalized *max* version for a feature that was
>> > previously never finalized.
>> > I have clarified this in the KIP now.
>> >
>> > > 6. I think we need a separate error code like
>> FEATURE_UPDATE_IN_PROGRESS
>> > to
>> > > reject a concurrent feature update request.
>> >
>> > (Kowshik): Great point! I have modified the KIP adding the above (see
>> > 'Tooling support -> Admin API changes').
>> >
>> > > 7. I think we haven't discussed the alternative solution to pass the
>> > > feature information through Zookeeper. Is that mentioned in the KIP to
>> > > justify why using UpdateMetadata is more favorable?
>> >
>> > (Kowshik): Nice question! The broker reads finalized feature info
>> stored in
>> > ZK,
>> > only during startup when it does a validation. When serving
>> > `ApiVersionsRequest`, the
>> > broker does not read this info from ZK directly. I'd imagine the risk is
>> > that it can increase
>> > the ZK read QPS which can be a bottleneck for the system. Today, in
>> Kafka
>> > we use the
>> > controller to fan out ZK updates to brokers and we want to stick to that
>> > pattern to avoid
>> > the ZK read bottleneck when serving `ApiVersionsRequest`.
>> >
>> > > 8. I was under the impression that user could configure a range of
>> > > supported versions, what's the trade-off for allowing single finalized
>> > > version only?
>> >
>> > (Kowshik): Great question! The finalized version of a feature basically
>> > refers to
>> > the cluster-wide finalized feature "maximum" version. For example, if
>> the
>> > 'group_coordinator' feature
>> > has the finalized version set to 10, then, it means that cluster-wide
>> all
>> > versions upto v10 are
>> > supported for this feature. However, note that if some version (ex: v0)
>> > gets deprecated
>> > for this feature, then we don’t convey that using this scheme (also
>> > supporting deprecation is a non-goal).
>> >
>> > (Kowshik): I’ve now modified the KIP at all points, refering to
>> finalized
>> > feature "maximum" versions.
>> >
>> > > 9. One minor syntax fix: Note that here the "client" here may be a
>> > producer
>> >
>> > (Kowshik): Great point! Done.
>> >
>> >
>> > Cheers,
>> > Kowshik
>> >
>> >
>> > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <reluctanthero...@gmail.com
>> >
>> > wrote:
>> >
>> > > Hey Kowshik,
>> > >
>> > > thanks for the revised KIP. Got a couple of questions:
>> > >
>> > > 1. "When is it safe for the brokers to begin handling EOS traffic"
>> could
>> > be
>> > > converted as "When is it safe for the brokers to start serving new
>> > > Exactly-Once(EOS) semantics" since EOS is not explained earlier in the
>> > > context.
>> > >
>> > > 2. In the *Explanation *section, the metadata version number part
>> seems a
>> > > bit blurred. Could you point a reference to later section that we
>> going
>> > to
>> > > store it in Zookeeper and update it every time when there is a feature
>> > > change?
>> > >
>> > > 3. For the feature downgrade, although it's a Non-goal of the KIP, for
>> > > features such as group coordinator semantics, there is no legal
>> scenario
>> > to
>> > > perform a downgrade at all. So having downgrade door open is pretty
>> > > error-prone as human faults happen all the time. I'm assuming as new
>> > > features are implemented, it's not very hard to add a flag during
>> feature
>> > > creation to indicate whether this feature is "downgradable". Could you
>> > > explain a bit more on the extra engineering effort for shipping this
>> KIP
>> > > with downgrade protection in place?
>> > >
>> > > 4. "Each broker’s supported dictionary of feature versions will be
>> > defined
>> > > in the broker code." So this means in order to restrict a certain
>> > feature,
>> > > we need to start the broker first and then send a feature gating
>> request
>> > > immediately, which introduces a time gap and the intended-to-close
>> > feature
>> > > could actually serve request during this phase. Do you think we should
>> > also
>> > > support configurations as well so that admin user could freely roll
>> up a
>> > > cluster with all nodes complying the same feature gating, without
>> > worrying
>> > > about the turnaround time to propagate the message only after the
>> cluster
>> > > starts up?
>> > >
>> > > 5. "adding a new Feature, updating or deleting an existing Feature",
>> may
>> > be
>> > > I misunderstood something, I thought the features are defined in
>> broker
>> > > code, so admin could not really create a new feature?
>> > >
>> > > 6. I think we need a separate error code like
>> FEATURE_UPDATE_IN_PROGRESS
>> > to
>> > > reject a concurrent feature update request.
>> > >
>> > > 7. I think we haven't discussed the alternative solution to pass the
>> > > feature information through Zookeeper. Is that mentioned in the KIP to
>> > > justify why using UpdateMetadata is more favorable?
>> > >
>> > > 8. I was under the impression that user could configure a range of
>> > > supported versions, what's the trade-off for allowing single finalized
>> > > version only?
>> > >
>> > > 9. One minor syntax fix: Note that here the "client" here may be a
>> > producer
>> > >
>> > > Boyang
>> > >
>> > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <cmcc...@apache.org>
>> wrote:
>> > >
>> > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam wrote:
>> > > > > Hi Colin,
>> > > > >
>> > > > > Thanks for the feedback! I've changed the KIP to address your
>> > > > > suggestions.
>> > > > > Please find below my explanation. Here is a link to KIP 584:
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
>> > > > > .
>> > > > >
>> > > > > 1. '__data_version__' is the version of the finalized feature
>> > metadata
>> > > > > (i.e. actual ZK node contents), while the '__schema_version__' is
>> the
>> > > > > version of the schema of the data persisted in ZK. These serve
>> > > different
>> > > > > purposes. '__data_version__' is is useful mainly to clients during
>> > > reads,
>> > > > > to differentiate between the 2 versions of eventually consistent
>> > > > 'finalized
>> > > > > features' metadata (i.e. larger metadata version is more recent).
>> > > > > '__schema_version__' provides an additional degree of flexibility,
>> > > where
>> > > > if
>> > > > > we decide to change the schema for '/features' node in ZK (in the
>> > > > future),
>> > > > > then we can manage broker roll outs suitably (i.e.
>> > > > > serialization/deserialization of the ZK data can be handled
>> safely).
>> > > >
>> > > > Hi Kowshik,
>> > > >
>> > > > If you're talking about a number that lets you know if data is more
>> or
>> > > > less recent, we would typically call that an epoch, and not a
>> version.
>> > > For
>> > > > the ZK data structures, the word "version" is typically reserved for
>> > > > describing changes to the overall schema of the data that is
>> written to
>> > > > ZooKeeper.  We don't even really change the "version" of those
>> schemas
>> > > that
>> > > > much, since most changes are backwards-compatible.  But we do
>> include
>> > > that
>> > > > version field just in case.
>> > > >
>> > > > I don't think we really need an epoch here, though, since we can
>> just
>> > > look
>> > > > at the broker epoch.  Whenever the broker registers, its epoch will
>> be
>> > > > greater than the previous broker epoch.  And the newly registered
>> data
>> > > will
>> > > > take priority.  This will be a lot simpler than adding a separate
>> epoch
>> > > > system, I think.
>> > > >
>> > > > >
>> > > > > 2. Regarding admin client needing min and max information - you
>> are
>> > > > right!
>> > > > > I've changed the KIP such that the Admin API also allows the user
>> to
>> > > read
>> > > > > 'supported features' from a specific broker. Please look at the
>> > section
>> > > > > "Admin API changes".
>> > > >
>> > > > Thanks.
>> > > >
>> > > > >
>> > > > > 3. Regarding the use of `long` vs `Long` - it was not deliberate.
>> > I've
>> > > > > improved the KIP to just use `long` at all places.
>> > > >
>> > > > Sounds good.
>> > > >
>> > > > >
>> > > > > 4. Regarding kafka.admin.FeatureCommand tool - you are right! I've
>> > > > updated
>> > > > > the KIP sketching the functionality provided by this tool, with
>> some
>> > > > > examples. Please look at the section "Tooling support examples".
>> > > > >
>> > > > > Thank you!
>> > > >
>> > > >
>> > > > Thanks, Kowshik.
>> > > >
>> > > > cheers,
>> > > > Colin
>> > > >
>> > > > >
>> > > > >
>> > > > > Cheers,
>> > > > > Kowshik
>> > > > >
>> > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <cmcc...@apache.org
>> >
>> > > > wrote:
>> > > > >
>> > > > > > Thanks, Kowshik, this looks good.
>> > > > > >
>> > > > > > In the "Schema" section, do we really need both
>> __schema_version__
>> > > and
>> > > > > > __data_version__?  Can we just have a single version field here?
>> > > > > >
>> > > > > > Shouldn't the Admin(Client) function have some way to get the
>> min
>> > and
>> > > > max
>> > > > > > information that we're exposing as well?  I guess we could have
>> > min,
>> > > > max,
>> > > > > > and current.  Unrelated: is the use of Long rather than long
>> > > deliberate
>> > > > > > here?
>> > > > > >
>> > > > > > It would be good to describe how the command line tool
>> > > > > > kafka.admin.FeatureCommand will work.  For example the flags
>> that
>> > it
>> > > > will
>> > > > > > take and the output that it will generate to STDOUT.
>> > > > > >
>> > > > > > cheers,
>> > > > > > Colin
>> > > > > >
>> > > > > >
>> > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam wrote:
>> > > > > > > Hi all,
>> > > > > > >
>> > > > > > > I've opened KIP-584
>> <https://issues.apache.org/jira/browse/KIP-584> <
>> > https://issues.apache.org/jira/browse/KIP-584
>> > > >
>> > > > > > > which
>> > > > > > > is intended to provide a versioning scheme for features. I'd
>> like
>> > > to
>> > > > use
>> > > > > > > this thread to discuss the same. I'd appreciate any feedback
>> on
>> > > this.
>> > > > > > > Here
>> > > > > > > is a link to KIP-584
>> <https://issues.apache.org/jira/browse/KIP-584>:
>> > > > > > >
>> > > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
>> > > > > > >  .
>> > > > > > >
>> > > > > > > Thank you!
>> > > > > > >
>> > > > > > >
>> > > > > > > Cheers,
>> > > > > > > Kowshik
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Reply via email to