Re: [VOTE] KIP-708: Rack awareness for Kafka Streams

Sophie Blee-Goldman Tue, 16 Mar 2021 12:34:39 -0700

>
> Could we not reuse the version of the subscription
> data? What are the main benefits of introducing "ClientTagsVersion"?



No, this version would have to be distinct from the protocol version that
we use for the subscription itself.
The reason being that the SubscriptionInfo version is only tied to the
metadata that we, as Kafka Streams,
choose to encode in a specific version. The "ClientTagsVersion" on the
other hand would be tied to the
specific tags that the *user* has chosen to encode in the
SubscriptionInfoData. So the SubscriptionInfo
version is a constant in the Streams source code, whereas you might have
any number of different
client tag versions as you evolve the client.tags used by your application.
They're orthogonal

(As for how to determine the ClientTagsVersion, most likely we would need
to add an additional config
and push it to the user to bump the version when they change the tags. We
could try to derive it ourselves,
but then we would need some way to persist the current version & tags so
that we can tell when to bump
the version, and I couldn't think of an appropriate way to do this off the
top of my head. Anyways we don't
really need to have this conversation until we want to make the tags
evolvable, as long as we know it's at
least possible)

Of course, we could sidestep all of this by just serializing the name of
each client.tag instead of an encoded
key based on its position in the configured task.rack.aware.assignment.tags
list, right? I understand the intention
is to save on bytes, but (a) it was always my impression that the
AssignmentInfo was more at risk of hitting
the max message size than the SubscriptionInfo, since the assignment has to
encode info for all tasks
across the app while the subscription only encodes local info, and (b) how
long do we expect the client
tags to really be? Can't we just push this on to the user to come up with
their own schema and embed
an abbreviation code in the client.tags, or just tell them not to use
insanely long tags (which seems unlikely in
the first place)? WDYT?

That makes sense about purposely excluding "standby" from the config name.
In that case
I would be happy with just "task.rack.aware.assignment.tags", although I'd
propose to shorten
it further by removing the "task" part of "standby.task" as well, ie just
"rack.aware.assignment.tags"
But I'll leave it up to Levani to make this call :)

On Tue, Mar 16, 2021 at 6:34 AM Bruno Cadonna <br...@confluent.io.invalid>
wrote:

> Forgot to say ...
>
> I am fine with the rest of the name you proposed, i.e.,
> "task.rack.aware.assignment.tags".
>
> Best,
> Bruno
>
> On 16.03.21 09:30, Bruno Cadonna wrote:
> > Hi Sophie,
> >
> > I am +1, for explicitly documenting that this list must be identical in
> > contents and order across all clients in the application in the KIP. And
> > later in the docs of the config. If we are too much concerned about this
> > and we think that we should explicitly check the order and content, we
> > could think about another encoding that is order independent or not
> > encode the tags at all.
> >
> > Good point about upgrading and evolving. That seems related to the
> > encoding of the tags. If the encoding contained a bit more information,
> > we could use the intersection of the tags to evolve the tags of existing
> > applications. That means, we only use tags that are present on all
> > clients. Having an encoding that contains more information would also
> > give us the possibility to support removing, changing, and reordering of
> > tags.
> >
> > I do not completely understand your question about the
> > "ClientTagsVersion". Could we not reuse the version of the subscription
> > data? What are the main benefits of introducing "ClientTagsVersion"?
> > Versioning a field of the protocol seems a bit too detailed for me, but
> > I could easily be missing something important here.
> >
> > Regarding "standby.task.rack.aware.assignment.tags", we intentionally
> > tried to avoid the words "standby" and "replica" in this name to not
> > limit this config to standby tasks. In future, we may want to use the
> > same config also for active tasks. Admittedly, I do not yet know how we
> > would use this config for active tasks but I think it is better to keep
> > it generic as much as reasonable.
> >
> > Best,
> > Bruno
> >
> >
> > On 15.03.21 23:07, Sophie Blee-Goldman wrote:
> >> Hey Levani, thanks for the KIP! This looks great.
> >>
> >> A few comments/questions about the "task.assignment.rack.awareness"
> >> config:
> >> since this is used to determine the encoding of the client tags, we
> >> should
> >> make sure to specify that this list must be identical in contents and
> >> order
> >> across all clients in the application. Unfortunately there doesn't
> >> seem to
> >> be a good way to actually enforce this, so we should call it out in the
> >> config doc string at the least.
> >>
> >> On that note, should it be possible for users to upgrade or evolve their
> >> tags over time? For example if a user wants to leverage this feature
> >> for an
> >> existing app, or add new tags to an app that already has some
> >> configured. I
> >> think we would need to either enforce that you can only add new tags
> >> to the
> >> end but never remove/change/reorder the existing ones, or else adopt a
> >> similar strategy as to version probing and force all clients to remain
> on
> >> the old protocol until everyone in the group has been updated to use the
> >> new tags. It's fine with me if you'd prefer to leave this out of scope
> >> for
> >> the time being, as long as we design this to be forwards-compatible as
> >> best
> >> we can. Have you considered adding a "ClientTagsVersion" to the
> >> SubscriptionInfo so that we're set up to extend this feature in the
> >> future?
> >> In my experience so far, any time we *don't* version a protocol like
> this
> >> we end up regretting it later.
> >>
> >> Whatever you decide, it should be documented clearly -- in the KIP and
> in
> >> the actual docs, ie upgrade guide and/or the config doc string -- so
> that
> >> users know whether they can ever change the client tags on a running
> >> application or not. (I think this is hinted at in the KIP, but not
> called
> >> out explicitly)
> >>
> >> By the way, I feel the "task.assignment.rack.awareness" config should
> >> have
> >> the words "clients" and/or "tags" somewhere in it, otherwise it's a bit
> >> unclear what it actually means. And maybe it should specify that it
> >> applies
> >> to standby task placement only? Obviously we don't need to cover every
> >> possible detail in the config name alone, but it could be a little more
> >> specific. What about "standby.task.rack.aware.assignment.tags" or
> >> something
> >> like that?
> >>
> >> On Mon, Mar 15, 2021 at 2:12 PM Levani Kokhreidze
> >> <levani.co...@gmail.com>
> >> wrote:
> >>
> >>> Hello all,
> >>>
> >>> Bumping this thread as we are one binding vote short accepting this
> KIP.
> >>> Please let me know if you have any extra concerns and/or suggestions.
> >>>
> >>> Regards,
> >>> Levani
> >>>
> >>>> On 12. Mar 2021, at 13:14, Levani Kokhreidze <levani.co...@gmail.com>
> >>> wrote:
> >>>>
> >>>> Hi Guozhang,
> >>>>
> >>>> Thanks for the feedback. I think it makes sense.
> >>>> I updated the KIP with your proposal [1], it’s a nice optimisation.
> >>>> I do agree that having the same configuration across Kafka Streams
> >>> instances is the reasonable requirement.
> >>>>
> >>>> Best,
> >>>> Levani
> >>>>
> >>>> [1] -
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-708%3A+Rack+awareness+for+Kafka+Streams
> >>>
> >>> <
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-708%3A+Rack+awareness+for+Kafka+Streams
> >>>
> >>>>
> >>>>
> >>>>
> >>>>> On 12. Mar 2021, at 03:36, Guozhang Wang <wangg...@gmail.com
> <mailto:
> >>> wangg...@gmail.com>> wrote:
> >>>>>
> >>>>> Hello Levani,
> >>>>>
> >>>>> Thanks for the great write-up! I think this proposal makes sense,
> >>> though I
> >>>>> have one minor suggestion regarding the protocol format change:
> >>>>> note the
> >>>>> subscription info is part of the group metadata message that we
> >>>>> need to
> >>>>> write into the internal topic, and hence it's always better if we can
> >>> save
> >>>>> on the number of bytes written there. For this, I'm wondering if we
> >>>>> can
> >>>>> encode the key part instead of writing raw bytes based on the
> >>>>> configurations, i.e.:
> >>>>>
> >>>>> 1. streams will look at the `task.assignment.rack.awareness`
> >>>>> values, and
> >>>>> encode them in a deterministic manner, e.g. in your example zone = 0,
> >>>>> cluster = 1. This assumes that all instances will configure this
> value
> >>> in
> >>>>> the same way and then with a deterministic manner all instances will
> >>> have
> >>>>> the same encodings, which I think is a reasonable requirement.
> >>>>> 2. the sent protocol would be "key => short, value => bytes" instead.
> >>>>>
> >>>>>
> >>>>> WDYT?
> >>>>>
> >>>>> Otherwise, I'm +1 on the KIP!
> >>>>>
> >>>>> Guozhang
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Thu, Mar 11, 2021 at 8:29 AM John Roesler <vvcep...@apache.org
> >>> <mailto:vvcep...@apache.org>> wrote:
> >>>>>
> >>>>>> Thanks for the KIP!
> >>>>>>
> >>>>>> I'm +1 (binding)
> >>>>>>
> >>>>>> -John
> >>>>>>
> >>>>>> On Wed, 2021-03-10 at 13:13 +0200, Levani Kokhreidze wrote:
> >>>>>>> Hello all,
> >>>>>>>
> >>>>>>> I’d like to start the voting on KIP-708 [1]
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Levani
> >>>>>>>
> >>>>>>> [1] -
> >>>>>>
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-708%3A+Rack+awareness+for+Kafka+Streams
> >>>
> >>> <
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-708%3A+Rack+awareness+for+Kafka+Streams
> >>>
> >>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> --
> >>>>> -- Guozhang
> >>>>
> >>>
> >>>
> >>
>

Re: [VOTE] KIP-708: Rack awareness for Kafka Streams

Reply via email to