Re: [VOTE] KIP-159: Introducing Rich functions to Streams

Guozhang Wang Mon, 06 Nov 2017 18:18:58 -0800

Regarding the API design (the proposed set of overloads v.s. one overload
on #map to enrich the record), I think what we have represents a good
trade-off between API succinctness and user convenience: on one hand we
definitely want to keep as fewer overloaded functions as possible. But on
the other hand if we only do that in, say, the #map() function then this
enrichment could be an overkill: think of a topology that has 7 operators
in a chain, where users want to access the record context on operator #2
and #6 only, with the "enrichment" manner they need to do the enrichment on
operator #2 and keep it that way until #6. In addition, the RecordContext
fields (topic, offset, etc) are really orthogonal to the key-value payloads
themselves, so I think separating them into this object is a cleaner way.


Regarding the RecordContext inheritance, this is actually a good point that
have not been discussed thoroughly before. Here are my my two cents: one
natural way would be to inherit the record context from the "triggering"
record, for example in a join operator, if the record from stream A
triggers the join then the record context is inherited from with that
record. This is also aligned with the lower-level PAPI interface. A counter
argument, though, would be that this is sort of leaking the internal
implementations of the DSL, so that moving forward if we did some
refactoring to our join implementations so that the triggering record can
change, the RecordContext would also be different. I do not know how much
it would really affect end users, but would like to hear your opinions.


Guozhang


On Mon, Nov 6, 2017 at 1:00 PM, Jeyhun Karimov <[email protected]> wrote:

> Hi Jan,
>
> Sorry for late reply.
>
>
> The API Design doesn't look appealing
>
>
> In terms of API design we tried to preserve the java functional interfaces.
> We applied the same set of rich methods for KTable to make it compatible
> with the rest of overloaded APIs.
>
> It should be 100% sufficient to offer a KTable + KStream that is directly
> > feed from a topic with 1 additional overload for the #map() methods to
> > cover every usecase while keeping the API in a way better state.
>
>
> - IMO this seems a workaround, rather than a direct solution.
>
> Perhaps we should continue this discussion in DISCUSS thread.
>
>
> Cheers,
> Jeyhun
>
>
> On Mon, Nov 6, 2017 at 9:14 PM Jan Filipiak <[email protected]>
> wrote:
>
> > Hi.
> >
> > I do understand that it might come in Handy.
> >  From my POV in any relational algebra this is only a projection.
> > Currently we hide these "fields" that come with the input record.
> > It should be 100% sufficient to offer a KTable + KStream that is directly
> > feed from a topic with 1 additional overload for the #map() methods to
> > cover every usecase while keeping the API in a way better state.
> >
> > best Jan
> >
> > On 06.11.2017 17:52, Matthias J. Sax wrote:
> > > Jan,
> > >
> > > I understand what you are saying. However, having a RecordContext is
> > > super useful for operations that are applied to input topic. Many users
> > > requested this feature -- it's much more convenient that falling back
> to
> > > transform() to implement a a filter() for example that want to access
> > > some meta data.
> > >
> > > Because we cannot distinguish different "origins" of a KStream/KTable,
> I
> > > am not sure if there would be a better way to do this. The only
> > > "workaround" I see, is to have two KStream/KTable interfaces each and
> we
> > > would use the first one for KStream/KTable with "proper" RecordContext.
> > > But this does not seem to be a good solution either.
> > >
> > > Note, a KTable can also be read directly from a topic, I agree that
> > > using RecordContext on a KTable that is the result of an aggregation is
> > > questionable. But I don't see a reason to down vote the KIP for this
> > reason.
> > >
> > > WDYT about this?
> > >
> > >
> > > -Matthias
> > >
> > > On 11/1/17 10:19 PM, Jan Filipiak wrote:
> > >> -1 non binding
> > >>
> > >> I don't get the motivation.
> > >> In 80% of my DSL processors there is no such thing as a reasonable
> > >> RecordContext.
> > >> After a join  the record I am processing belongs to at least 2 topics.
> > >> After a Group by the record I am processing was created from multiple
> > >> offsets.
> > >>
> > >> The API Design doesn't look appealing
> > >>
> > >> Best Jan
> > >>
> > >>
> > >>
> > >> On 01.11.2017 22:02, Jeyhun Karimov wrote:
> > >>> Dear community,
> > >>>
> > >>> It seems the discussion for KIP-159 [1] converged finally. I would
> > >>> like to
> > >>> initiate voting for the particular KIP.
> > >>>
> > >>>
> > >>>
> > >>> [1]
> > >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 159%3A+Introducing+Rich+functions+to+Streams
> > >>>
> > >>>
> > >>> Cheers,
> > >>> Jeyhun
> > >>>
> >
> >
>



-- 
-- Guozhang

Re: [VOTE] KIP-159: Introducing Rich functions to Streams

Reply via email to