Re: [DISCUSS] KIP-150 - Kafka-Streams Cogroup

Sophie Blee-Goldman Wed, 23 Oct 2019 11:38:33 -0700

Thanks for the explanation, makes sense to me! As for the API, one other
thought I had is might we ever want or need to introduce any other configs
or parameters in the future? Obviously that's difficult to say now (or
maybe the
answer seems obviously "no") but we seem to often end up needing to add new
overloads and/or deprecate old ones as new features or requirements come
into
play.


What do you (and others?) think about wrapping the config parameters (ie
everything
except the actual grouped streams) in a new config object? For example, the
CogroupedStream#aggregate field could take a single Cogrouped object,
which itself would have an initializer and a materialized. If we ever need
to add
a new parameter, we can just add it to the Cogrouped class.

Also, will the backing store be available for IQ if a Materialized is
passed in?

On Wed, Oct 23, 2019 at 10:49 AM Walker Carlson <wcarl...@confluent.io>
wrote:

> Hi Sophie,
>
> Thank you for your comments. As for the different methods signatures I have
> not really considered any other options but  while I do agree it is
> confusing, I don't see any obvious solutions. The problem is that the
> cogroup essentially pairs a group stream with an aggregator and when it is
> first made the method is called on a groupedStream already. However each
> subsequent stream-aggregator pair is added on to a cogroup stream so it
> needs both arguments.
>
> For the second question you should not need a joiner. The idea is that you
> can collect many grouped streams with overlapping key spaces and any kind
> of value types. Once aggregated its value will be reduced into one type.
> This is why you need only one initializer. Each aggregator will need to
> integrate the new value with the new object made in the initializer.
> Does that make sense?
>
> This is a good question and I will include this explanation in the kip as
> well.
>
> Thanks,
> Walker
>
> On Tue, Oct 22, 2019 at 8:59 PM Sophie Blee-Goldman <sop...@confluent.io>
> wrote:
>
> > Hey Walker,
> >
> > Thanks for the KIP! I have just a couple of questions:
> >
> > 1) It seems a little awkward to me that with the current API, we have a
> > nearly identical
> > "add stream to cogroup" method, except for the first which has a
> different
> > signature
> > (ie the first stream is joined as stream.cogroup(Aggregator) while the
> > subsequent ones
> > are joined as .cogroup(Stream, Aggregator) ). I'm not sure what it would
> > look like exactly,
> > but I was just wondering if you'd considered and/or rejected any
> > alternative APIs?
> >
> > 2) This might just be my lack of familiarity with "cogroup" as a concept,
> > but with the
> > current (non-optimal) API the user seems to have some control over how
> > exactly
> > the different streams are joined through the ValueJoiners. Would this new
> > cogroup
> > simply concatenate the values from the different cogroup streams, or
> could
> > users
> > potentially pass some kind of Joiner to the cogroup/aggregate methods?
> Or,
> > is the
> > whole point of cogroups that you no longer ever need to specify a Joiner?
> > If so, you
> > should add a short line to the KIP explaining that for those of us who
> > aren't fluent
> > in cogroup semantics :)
> >
> > Cheers,
> > Sophie
> >
> > On Thu, Oct 17, 2019 at 3:06 PM Walker Carlson <wcarl...@confluent.io>
> > wrote:
> >
> > > Good catch I updated that.
> > >
> > > I have made a PR for this KIP
> > >
> > > I then am splitting it into 3 parts, first cogroup for a key-value
> store
> > (
> > > here <https://github.com/apache/kafka/pull/7538>), then for a
> > > timeWindowedStore, and then a sessionWindowedStore + ensuring
> > partitioning.
> > >
> > > Walker
> > >
> > > On Tue, Oct 15, 2019 at 12:47 PM Matthias J. Sax <
> matth...@confluent.io>
> > > wrote:
> > >
> > > > Walker,
> > > >
> > > > thanks for picking up the KIP and reworking it for the changed API.
> > > >
> > > > Overall, the updated API suggestions make sense to me. The seem to
> > align
> > > > quite nicely with our current API design.
> > > >
> > > > One nit: In `CogroupedKStream#aggregate(...)` the type parameter of
> > > > `Materialized` should be `V`, not `VR`?
> > > >
> > > >
> > > > -Matthias
> > > >
> > > >
> > > >
> > > > On 10/14/19 2:57 PM, Walker Carlson wrote:
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-150+-+Kafka-Streams+Cogroup
> > > > > here
> > > > > is a link
> > > > >
> > > > > On Mon, Oct 14, 2019 at 2:52 PM Walker Carlson <
> > wcarl...@confluent.io>
> > > > > wrote:
> > > > >
> > > > >> Hello all,
> > > > >>
> > > > >> I have picked up and updated KIP-150. Due to changes to the
> project
> > > > since
> > > > >> KIP #150 was written there are a few items that need to be
> updated.
> > > > >>
> > > > >> First item that changed is the adoption of the Materialized
> > parameter.
> > > > >>
> > > > >> The second item is the WindowedBy. How the old KIP handles
> windowing
> > > is
> > > > >> that it overloads the aggregate function to take in a Window
> object
> > as
> > > > well
> > > > >> as the other parameters. The current practice to window
> > > grouped-streams
> > > > is
> > > > >> to call windowedBy and receive a windowed stream object. The
> > existing
> > > > >> interface for a windowed stream made from a grouped stream will
> not
> > > work
> > > > >> for cogrouped streams. Hence, we have to make new interfaces for
> > > > cogrouped
> > > > >> windowed streams.
> > > > >>
> > > > >> Please take a look, I would like to hear your feedback,
> > > > >>
> > > > >> Walker
> > > > >>
> > > > >
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-150 - Kafka-Streams Cogroup

Reply via email to