Re: [DISCUSS] KIP-372: Naming Joins and Grouping

Guozhang Wang Fri, 14 Sep 2018 10:50:00 -0700

Hello John,

Not sure if I completely understand your email above. Are you suggesting to
still use the proposed Joined / Grouped object to indicate the underlying
processor names *in addition to* the repartition topic names?


My reasoning is that, if we do not want to use the proposed names to
indicate underlying processor names, only the repartition topic names, then
the current proposal looks good since KTable#join would not introduce
repartition topics at all. AND regarding underlying processor names, it is
suggested that we should only consider it in KIP-307 separately. Are you
suggesting that we should consider them together?


Guozhang

On Thu, Sep 13, 2018 at 3:49 PM, John Roesler <j...@confluent.io> wrote:

> Hey all,
>
> I think it's slightly out of scope for this KIP, but I'm not sure it's
> right to add a name to ValueJoiner or KeyValueMapper.
> Both of those are "functional interfaces", that is, they are basically
> named functions.
>
> It seems like we should preserve this property both to provide a clean
> separation between the operation *logic* and the operator *config*.
>
> It's a good point that `Joined` doesn't appear in the KTable join. However,
> KTable join does have the variant that accepts "Materialized".
> This makes it similar to the grouping operators that currently accept
> "Serialized", namely the operator is already configurable, but only in
> a narrow way (we can only configure the materialization of the table or the
> serialization of the grouping).
>
> Consider as an alternative the KStream.join operation that takes a config
> object called "Joined". This implies no more than that we are
> configuring the join operation. If we are deciding to allow adding a name
> to the operation, it's natural to add it to the operation's config.
>
> Conversely, we also want to name the KTable.join operation, not the
> materialization itself (which already has a name with separate semantics).
>
> To me, this suggests that, just like we propose to subsume the Serialized
> config for grouping into a more general config called Grouped, we should
> also want to do something similar and replace `KTable#join(KTable<>,
> ValueJoiner<>, Materialized<>)` with one taking a more general
> configuration, maybe:
> `KTable#join(KTable<>, ValueJoiner<>, TableJoined<>)`?
>
> Does this make sense?
>
> Thanks,
> -John
>
> On Thu, Sep 13, 2018 at 3:45 PM Guozhang Wang <wangg...@gmail.com> wrote:
>
> > Just to clarify on 2): currently KIP-307 do not have proposed APIs for
> > `groupBy/groupByKey` naming schemes, and for joins its current proposal
> is
> > to extend ValueJoiner with Named and hence this part is what I meant to
> > have "overlaps".
> >
> > Thinking about it a bit more, since Joined is only used for S-S and S-T
> > joins but not T-T joins, having the naming schemes on Joined would not be
> > sufficient, and extending ValueJoiner would indeed be a good choice.
> >
> > As for groupBy, since it is using KeyValueMapper which is supposed to be
> > extended with Named in KIP-307, it does not require extending to
> processor
> > nodes as well.
> >
> >
> > Given this, I'm fine with limiting the scope to only repartition topics.
> >
> >
> > Guozhang
> >
> > On Wed, Sep 12, 2018 at 10:22 PM, Matthias J. Sax <matth...@confluent.io
> >
> > wrote:
> >
> > > Follow up comments:
> > >
> > > 1) We should either use `[app-id]-this|other-[join-name]-repartition`
> or
> > > `app-id]-[join-name]-left|right-repartition` but we should not change
> > > the pattern depending if the user specifies a name of not. I am fine
> > > with both patterns---just want to make sure with stick with one.
> > >
> > > 2) I didn't see why we would need to do this in this KIP. KIP-307 seems
> > > to be orthogonal, and thus KIP-372 should not change any processor
> > > names, but KIP-307 should define a holistic strategy for all processor.
> > > Otherwise, we might up with different strategies or revert what we
> > > decide in this KIP if it's not compatible with KIP-307.
> > >
> > >
> > > -Matthias
> > >
> > >
> > > On 9/12/18 6:28 PM, Guozhang Wang wrote:
> > > > Hello Bill,
> > > >
> > > > I made a pass over your proposal and here are some questions:
> > > >
> > > > 1. For Joined names, the current proposal is to define the
> repartition
> > > > topic names as
> > > >
> > > > * [app-id]-this-[join-name]-repartition
> > > >
> > > > * [app-id]-other-[join-name]-repartition
> > > >
> > > >
> > > > And if [join-name] not specified, stay the same, which is:
> > > >
> > > > * [previous-processor-name]-repartition for both Stream-Stream (S-S)
> > > join
> > > > and S-T join
> > > >
> > > > I think it is more natural to rename it to
> > > >
> > > > * [app-id]-[join-name]-left-repartition
> > > >
> > > > * [app-id]-[join-name]-right-repartition
> > > >
> > > >
> > > > 2. I'd suggest to use the name to also define the corresponding
> > processor
> > > > names accordingly, in addition to the repartition topic names. Note
> > that
> > > > for joins, this may be overlapping with KIP-307
> > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 307%3A+Allow+to+define+custom+processor+names+with+KStreams+DSL>
> > > > as
> > > > it also have proposals for defining processor names for join
> operators
> > as
> > > > well.
> > > >
> > > > 3. Could you also specify how this would affect the optimization for
> > > > merging multiple repartition topics?
> > > >
> > > > 4. In the "Compatibility, Deprecation, and Migration Plan" section,
> > could
> > > > you also mention the following scenarios, if any of the upgrade path
> > > would
> > > > be changed:
> > > >
> > > >  a) changing user DSL code: under which scenarios users can now do a
> > > > rolling bounce instead of resetting applications.
> > > >
> > > >  b) upgrading from older version to new version, with all the names
> > > > specified, and with optimization turned on. E.g. say we have the code
> > > > written in 2.1 with all names specified, and now upgrading to 2.2
> with
> > > new
> > > > optimizations that may potentially change the repartition topics. Is
> > that
> > > > always safe to do?
> > > >
> > > >
> > > >
> > > > Guozhang
> > > >
> > > >
> > > > On Wed, Sep 12, 2018 at 4:52 PM, Bill Bejeck <bbej...@gmail.com>
> > wrote:
> > > >
> > > >> All I'd like to start a discussion on KIP-372 for the naming of
> joins
> > > and
> > > >> grouping operations in Kafka Streams.
> > > >>
> > > >> The KIP page can be found here:
> > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > >> 372%3A+Naming+Joins+and+Grouping
> > > >>
> > > >> I look forward to feedback and comments.
> > > >>
> > > >> Thanks,
> > > >> Bill
> > > >>
> > > >
> > > >
> > > >
> > >
> > >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang

Re: [DISCUSS] KIP-372: Naming Joins and Grouping

Reply via email to