Re: [DISCUSS] KIP-150 - Kafka-Streams Cogroup

Sophie Blee-Goldman Wed, 23 Oct 2019 21:59:36 -0700

> I can personally not see any need to add other configuration
Famous last words?


Just kidding, 95% confidence is more than enough to  me (and better to
optimize for current
design than for hypothetical future changes).

One last question I have then is about the operator/store/repartition
naming -- seems like
we can name the underlying store/changelog through the Materialized
parameter, but do we
also want to include an overload taking a Named parameter for the operator
name (as in the
KTable#join variations)?

On Wed, Oct 23, 2019 at 5:14 PM Matthias J. Sax <matth...@confluent.io>
wrote:

> Interesting idea, Sophie.
>
> So far, we tried to reuse existing config objects and only add new ones
> when needed to avoid creating "redundant" classes. This is of course a
> reactive approach (with the drawback to deprecate stuff if we change it,
> as you described).
>
> I can personally not see any need to add other configuration parameters
> atm, so it's a 95% obvious "no" IMHO. The final `aggregate()` has only a
> single state store that we need to configure, and reusing `Materialized`
> seems to be appropriate.
>
> Also note, that the `Initializer` is a mandatory parameter and not a
> configuration and should be passed directly, and not via a configuration
> object.
>
>
> -Matthias
>
> On 10/23/19 11:37 AM, Sophie Blee-Goldman wrote:
> > Thanks for the explanation, makes sense to me! As for the API, one other
> > thought I had is might we ever want or need to introduce any other
> configs
> > or parameters in the future? Obviously that's difficult to say now (or
> > maybe the
> > answer seems obviously "no") but we seem to often end up needing to add
> new
> > overloads and/or deprecate old ones as new features or requirements come
> > into
> > play.
> >
> > What do you (and others?) think about wrapping the config parameters (ie
> > everything
> > except the actual grouped streams) in a new config object? For example,
> the
> > CogroupedStream#aggregate field could take a single Cogrouped object,
> > which itself would have an initializer and a materialized. If we ever
> need
> > to add
> > a new parameter, we can just add it to the Cogrouped class.
> >
> > Also, will the backing store be available for IQ if a Materialized is
> > passed in?
> >
> > On Wed, Oct 23, 2019 at 10:49 AM Walker Carlson <wcarl...@confluent.io>
> > wrote:
> >
> >> Hi Sophie,
> >>
> >> Thank you for your comments. As for the different methods signatures I
> have
> >> not really considered any other options but  while I do agree it is
> >> confusing, I don't see any obvious solutions. The problem is that the
> >> cogroup essentially pairs a group stream with an aggregator and when it
> is
> >> first made the method is called on a groupedStream already. However each
> >> subsequent stream-aggregator pair is added on to a cogroup stream so it
> >> needs both arguments.
> >>
> >> For the second question you should not need a joiner. The idea is that
> you
> >> can collect many grouped streams with overlapping key spaces and any
> kind
> >> of value types. Once aggregated its value will be reduced into one type.
> >> This is why you need only one initializer. Each aggregator will need to
> >> integrate the new value with the new object made in the initializer.
> >> Does that make sense?
> >>
> >> This is a good question and I will include this explanation in the kip
> as
> >> well.
> >>
> >> Thanks,
> >> Walker
> >>
> >> On Tue, Oct 22, 2019 at 8:59 PM Sophie Blee-Goldman <
> sop...@confluent.io>
> >> wrote:
> >>
> >>> Hey Walker,
> >>>
> >>> Thanks for the KIP! I have just a couple of questions:
> >>>
> >>> 1) It seems a little awkward to me that with the current API, we have a
> >>> nearly identical
> >>> "add stream to cogroup" method, except for the first which has a
> >> different
> >>> signature
> >>> (ie the first stream is joined as stream.cogroup(Aggregator) while the
> >>> subsequent ones
> >>> are joined as .cogroup(Stream, Aggregator) ). I'm not sure what it
> would
> >>> look like exactly,
> >>> but I was just wondering if you'd considered and/or rejected any
> >>> alternative APIs?
> >>>
> >>> 2) This might just be my lack of familiarity with "cogroup" as a
> concept,
> >>> but with the
> >>> current (non-optimal) API the user seems to have some control over how
> >>> exactly
> >>> the different streams are joined through the ValueJoiners. Would this
> new
> >>> cogroup
> >>> simply concatenate the values from the different cogroup streams, or
> >> could
> >>> users
> >>> potentially pass some kind of Joiner to the cogroup/aggregate methods?
> >> Or,
> >>> is the
> >>> whole point of cogroups that you no longer ever need to specify a
> Joiner?
> >>> If so, you
> >>> should add a short line to the KIP explaining that for those of us who
> >>> aren't fluent
> >>> in cogroup semantics :)
> >>>
> >>> Cheers,
> >>> Sophie
> >>>
> >>> On Thu, Oct 17, 2019 at 3:06 PM Walker Carlson <wcarl...@confluent.io>
> >>> wrote:
> >>>
> >>>> Good catch I updated that.
> >>>>
> >>>> I have made a PR for this KIP
> >>>>
> >>>> I then am splitting it into 3 parts, first cogroup for a key-value
> >> store
> >>> (
> >>>> here <https://github.com/apache/kafka/pull/7538>), then for a
> >>>> timeWindowedStore, and then a sessionWindowedStore + ensuring
> >>> partitioning.
> >>>>
> >>>> Walker
> >>>>
> >>>> On Tue, Oct 15, 2019 at 12:47 PM Matthias J. Sax <
> >> matth...@confluent.io>
> >>>> wrote:
> >>>>
> >>>>> Walker,
> >>>>>
> >>>>> thanks for picking up the KIP and reworking it for the changed API.
> >>>>>
> >>>>> Overall, the updated API suggestions make sense to me. The seem to
> >>> align
> >>>>> quite nicely with our current API design.
> >>>>>
> >>>>> One nit: In `CogroupedKStream#aggregate(...)` the type parameter of
> >>>>> `Materialized` should be `V`, not `VR`?
> >>>>>
> >>>>>
> >>>>> -Matthias
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 10/14/19 2:57 PM, Walker Carlson wrote:
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-150+-+Kafka-Streams+Cogroup
> >>>>>> here
> >>>>>> is a link
> >>>>>>
> >>>>>> On Mon, Oct 14, 2019 at 2:52 PM Walker Carlson <
> >>> wcarl...@confluent.io>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hello all,
> >>>>>>>
> >>>>>>> I have picked up and updated KIP-150. Due to changes to the
> >> project
> >>>>> since
> >>>>>>> KIP #150 was written there are a few items that need to be
> >> updated.
> >>>>>>>
> >>>>>>> First item that changed is the adoption of the Materialized
> >>> parameter.
> >>>>>>>
> >>>>>>> The second item is the WindowedBy. How the old KIP handles
> >> windowing
> >>>> is
> >>>>>>> that it overloads the aggregate function to take in a Window
> >> object
> >>> as
> >>>>> well
> >>>>>>> as the other parameters. The current practice to window
> >>>> grouped-streams
> >>>>> is
> >>>>>>> to call windowedBy and receive a windowed stream object. The
> >>> existing
> >>>>>>> interface for a windowed stream made from a grouped stream will
> >> not
> >>>> work
> >>>>>>> for cogrouped streams. Hence, we have to make new interfaces for
> >>>>> cogrouped
> >>>>>>> windowed streams.
> >>>>>>>
> >>>>>>> Please take a look, I would like to hear your feedback,
> >>>>>>>
> >>>>>>> Walker
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>
>

Re: [DISCUSS] KIP-150 - Kafka-Streams Cogroup

Reply via email to