Re: [VOTE] KIP-159: Introducing Rich functions to Streams

2017-11-20 Thread Jan Filipiak
On 19.11.2017 21:12, Guozhang Wang wrote: Jan: which approach are you referring to as "the approach that is on the table would be perfect"? The SourcesKStream/Table suggestion. Note that in today's PAPI layer we are already effectively exposing the record context which has the issues that we

Re: [VOTE] KIP-159: Introducing Rich functions to Streams

2017-11-20 Thread Guozhang Wang
@Jeyhun, I understand that the discussion about KIP-159 is dragging long, so while we are moving on discussion for whether / hows of KIP-159, maybe you can start implementing the non overlapping part of the APIs of KIP-149 to get you unblocked? Guozhang On Sun, Nov 19, 2017 at 12:12 PM, Guozhang

Re: [VOTE] KIP-159: Introducing Rich functions to Streams

2017-11-19 Thread Guozhang Wang
Jan: which approach are you referring to as "the approach that is on the table would be perfect"? Note that in today's PAPI layer we are already effectively exposing the record context which has the issues that we have been discussing right now, and its semantics is always referring to the "proces

Re: [VOTE] KIP-159: Introducing Rich functions to Streams

2017-11-18 Thread Jan Filipiak
Hi, not an issue at all. IMO the approach that is on the table would be perfect On 18.11.2017 10:58, Jeyhun Karimov wrote: Hi, I did not expected that Context will be this much an issue. Instead of applying different semantics for different operators, I think we should remove this feature com

Re: [VOTE] KIP-159: Introducing Rich functions to Streams

2017-11-18 Thread Jeyhun Karimov
Hi, I did not expected that Context will be this much an issue. Instead of applying different semantics for different operators, I think we should remove this feature completely. Cheers, Jeyhun On Sat 18. Nov 2017 at 07:49, Jan Filipiak wrote: > Yes, the mail said only join so I wanted to clar

Re: [VOTE] KIP-159: Introducing Rich functions to Streams

2017-11-17 Thread Jan Filipiak
Yes, the mail said only join so I wanted to clarify. On 17.11.2017 19:05, Matthias J. Sax wrote: Yes. But I think an aggregation is an many-to-one operation, too. For the stripping off part: internally, we can just keep some record context, but just do not allow users to access it (because th

Re: [VOTE] KIP-159: Introducing Rich functions to Streams

2017-11-17 Thread Matthias J. Sax
Yes. But I think an aggregation is an many-to-one operation, too. For the stripping off part: internally, we can just keep some record context, but just do not allow users to access it (because the context context does not make sense for them) by hiding the corresponding APIs. -Matthias On 11/1

Re: [VOTE] KIP-159: Introducing Rich functions to Streams

2017-11-17 Thread Jan Filipiak
Hey, ... and group by. And yes there is no logical context we can present. The context present has nothing todo with the record currently processed. Its just doesn't come out https://en.wikipedia.org/wiki/Relational_algebra#Aggregation I am all in on this approach. Best Jan On 17.11.2017 07:

Re: [VOTE] KIP-159: Introducing Rich functions to Streams

2017-11-16 Thread Guozhang Wang
Matthias, For this idea, are your proposing that for any many-to-one mapping operations (for now only Join operators), we will strip off the record context in the resulted records and claim "we cannot infer its traced context anymore"? Guozhang On Thu, Nov 16, 2017 at 1:03 PM, Matthias J. Sax

Re: [VOTE] KIP-159: Introducing Rich functions to Streams

2017-11-16 Thread Matthias J. Sax
Any thoughts about my latest proposal? -Matthias On 11/10/17 10:02 PM, Jan Filipiak wrote: > Hi, > > i think this is the better way. Naming is always tricky Source is kinda > taken > I had TopicBackedK[Source|Table] in mind > but for the user its way better already IMHO > > Thank you for recons

Re: [VOTE] KIP-159: Introducing Rich functions to Streams

2017-11-10 Thread Jan Filipiak
Hi, i think this is the better way. Naming is always tricky Source is kinda taken I had TopicBackedK[Source|Table] in mind but for the user its way better already IMHO Thank you for reconsideration Best Jan On 10.11.2017 22:48, Matthias J. Sax wrote: I was thinking about the source stream/

Re: [VOTE] KIP-159: Introducing Rich functions to Streams

2017-11-10 Thread Matthias J. Sax
I was thinking about the source stream/table idea once more and it seems it would not be too hard to implement: We add two new classes SourceKStream extends KStream and SourceKTable extend KTable and return both from StreamsBuilder#stream and StreamsBuilder#table As both are sub-classes,

Re: [VOTE] KIP-159: Introducing Rich functions to Streams

2017-11-09 Thread Jan Filipiak
Okay, looks like it would _at least work_ for Cached KTableSources . But we make it harder to the user to make mistakes by putting features into places where they don't make sense and don't help anyone. I once again think that my suggestion is easier to implement and more correct. I will use thi

Re: [VOTE] KIP-159: Introducing Rich functions to Streams

2017-11-09 Thread Guozhang Wang
With our current state store implementation, when we are doing the two-way join operators like Stream-Stream or Table-Table, where each stream's record may trigger the join, it is hard to retrieve the record context for the matched record from the other stream's materialized state since we do not k

Re: [VOTE] KIP-159: Introducing Rich functions to Streams

2017-11-09 Thread Guozhang Wang
Hello Jan, Regarding your question about caching: today we keep the record context with the cached entry already so when we flush the cache which may generate new records forwarding we will set the record context appropriately; and then after the flush is completed we will reset the context to the

Re: [VOTE] KIP-159: Introducing Rich functions to Streams

2017-11-07 Thread Jeyhun Karimov
Hi Jan, Thanks for your comments. I agree that the implementation should not introduce new "bugs" or "known issues" in future. I think we can either i) just drop RecordContext argument for join methods or ii) introduce binary aggregation logic for RecordContexts for two-input-stream-operators. An

Re: [VOTE] KIP-159: Introducing Rich functions to Streams

2017-11-07 Thread Jan Filipiak
On 07.11.2017 12:59, Jan Filipiak wrote: On 07.11.2017 11:20, Matthias J. Sax wrote: About implementation if we do the KIP as proposed: I agree with Guozhang that we would need to use the currently processed record's metadata in the context. This does leak some implementation details, but I pe

Re: [VOTE] KIP-159: Introducing Rich functions to Streams

2017-11-07 Thread Jan Filipiak
On 07.11.2017 11:20, Matthias J. Sax wrote: About implementation if we do the KIP as proposed: I agree with Guozhang that we would need to use the currently processed record's metadata in the context. This does leak some implementation details, but I personally don't see a big issue here (at the

Re: [VOTE] KIP-159: Introducing Rich functions to Streams

2017-11-07 Thread Matthias J. Sax
About implementation if we do the KIP as proposed: I agree with Guozhang that we would need to use the currently processed record's metadata in the context. This does leak some implementation details, but I personally don't see a big issue here (at the same time, I am also fine to remove the Record

Re: [VOTE] KIP-159: Introducing Rich functions to Streams

2017-11-06 Thread Jan Filipiak
I Aggree completely. Exposing this information in a place where it has no _natural_ belonging might really be a bad blocker in the long run. Concerning your first point. I would argue its not to hard to have a user keep track of these. If we still don't want the user to keep track of these I

Re: [VOTE] KIP-159: Introducing Rich functions to Streams

2017-11-06 Thread Guozhang Wang
Regarding the API design (the proposed set of overloads v.s. one overload on #map to enrich the record), I think what we have represents a good trade-off between API succinctness and user convenience: on one hand we definitely want to keep as fewer overloaded functions as possible. But on the other

Re: [VOTE] KIP-159: Introducing Rich functions to Streams

2017-11-06 Thread Jeyhun Karimov
Hi Jan, Sorry for late reply. The API Design doesn't look appealing In terms of API design we tried to preserve the java functional interfaces. We applied the same set of rich methods for KTable to make it compatible with the rest of overloaded APIs. It should be 100% sufficient to offer a KT

Re: [VOTE] KIP-159: Introducing Rich functions to Streams

2017-11-06 Thread Jan Filipiak
Hi. I do understand that it might come in Handy. From my POV in any relational algebra this is only a projection. Currently we hide these "fields" that come with the input record. It should be 100% sufficient to offer a KTable + KStream that is directly feed from a topic with 1 additional overloa

Re: [VOTE] KIP-159: Introducing Rich functions to Streams

2017-11-06 Thread Matthias J. Sax
Jan, I understand what you are saying. However, having a RecordContext is super useful for operations that are applied to input topic. Many users requested this feature -- it's much more convenient that falling back to transform() to implement a a filter() for example that want to access some meta

Re: [VOTE] KIP-159: Introducing Rich functions to Streams

2017-11-01 Thread Jan Filipiak
-1 non binding I don't get the motivation. In 80% of my DSL processors there is no such thing as a reasonable RecordContext. After a join the record I am processing belongs to at least 2 topics. After a Group by the record I am processing was created from multiple offsets. The API Design do

[VOTE] KIP-159: Introducing Rich functions to Streams

2017-11-01 Thread Jeyhun Karimov
Dear community, It seems the discussion for KIP-159 [1] converged finally. I would like to initiate voting for the particular KIP. [1] https://cwiki.apache.org/confluence/display/KAFKA/KIP-159%3A+Introducing+Rich+functions+to+Streams Cheers, Jeyhun