Re: [DISCUSS] KIP-372: Naming Joins and Grouping

Guozhang Wang Thu, 13 Sep 2018 13:45:41 -0700

Just to clarify on 2): currently KIP-307 do not have proposed APIs for
`groupBy/groupByKey` naming schemes, and for joins its current proposal is
to extend ValueJoiner with Named and hence this part is what I meant to
have "overlaps".


Thinking about it a bit more, since Joined is only used for S-S and S-T
joins but not T-T joins, having the naming schemes on Joined would not be
sufficient, and extending ValueJoiner would indeed be a good choice.

As for groupBy, since it is using KeyValueMapper which is supposed to be
extended with Named in KIP-307, it does not require extending to processor
nodes as well.


Given this, I'm fine with limiting the scope to only repartition topics.


Guozhang

On Wed, Sep 12, 2018 at 10:22 PM, Matthias J. Sax <[email protected]>
wrote:

> Follow up comments:
>
> 1) We should either use `[app-id]-this|other-[join-name]-repartition` or
> `app-id]-[join-name]-left|right-repartition` but we should not change
> the pattern depending if the user specifies a name of not. I am fine
> with both patterns---just want to make sure with stick with one.
>
> 2) I didn't see why we would need to do this in this KIP. KIP-307 seems
> to be orthogonal, and thus KIP-372 should not change any processor
> names, but KIP-307 should define a holistic strategy for all processor.
> Otherwise, we might up with different strategies or revert what we
> decide in this KIP if it's not compatible with KIP-307.
>
>
> -Matthias
>
>
> On 9/12/18 6:28 PM, Guozhang Wang wrote:
> > Hello Bill,
> >
> > I made a pass over your proposal and here are some questions:
> >
> > 1. For Joined names, the current proposal is to define the repartition
> > topic names as
> >
> > * [app-id]-this-[join-name]-repartition
> >
> > * [app-id]-other-[join-name]-repartition
> >
> >
> > And if [join-name] not specified, stay the same, which is:
> >
> > * [previous-processor-name]-repartition for both Stream-Stream (S-S)
> join
> > and S-T join
> >
> > I think it is more natural to rename it to
> >
> > * [app-id]-[join-name]-left-repartition
> >
> > * [app-id]-[join-name]-right-repartition
> >
> >
> > 2. I'd suggest to use the name to also define the corresponding processor
> > names accordingly, in addition to the repartition topic names. Note that
> > for joins, this may be overlapping with KIP-307
> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 307%3A+Allow+to+define+custom+processor+names+with+KStreams+DSL>
> > as
> > it also have proposals for defining processor names for join operators as
> > well.
> >
> > 3. Could you also specify how this would affect the optimization for
> > merging multiple repartition topics?
> >
> > 4. In the "Compatibility, Deprecation, and Migration Plan" section, could
> > you also mention the following scenarios, if any of the upgrade path
> would
> > be changed:
> >
> >  a) changing user DSL code: under which scenarios users can now do a
> > rolling bounce instead of resetting applications.
> >
> >  b) upgrading from older version to new version, with all the names
> > specified, and with optimization turned on. E.g. say we have the code
> > written in 2.1 with all names specified, and now upgrading to 2.2 with
> new
> > optimizations that may potentially change the repartition topics. Is that
> > always safe to do?
> >
> >
> >
> > Guozhang
> >
> >
> > On Wed, Sep 12, 2018 at 4:52 PM, Bill Bejeck <[email protected]> wrote:
> >
> >> All I'd like to start a discussion on KIP-372 for the naming of joins
> and
> >> grouping operations in Kafka Streams.
> >>
> >> The KIP page can be found here:
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> 372%3A+Naming+Joins+and+Grouping
> >>
> >> I look forward to feedback and comments.
> >>
> >> Thanks,
> >> Bill
> >>
> >
> >
> >
>
>


-- 
-- Guozhang

Re: [DISCUSS] KIP-372: Naming Joins and Grouping

Reply via email to