Hello Jan, Regarding your question about caching: today we keep the record context with the cached entry already so when we flush the cache which may generate new records forwarding we will set the record context appropriately; and then after the flush is completed we will reset the context to the record before the flush happens. But I think when Jeyhun did the PR it is a good time to double check on such stages to make sure we are not introducing any regressions.
Guozhang On Mon, Nov 6, 2017 at 8:54 PM, Jan Filipiak <jan.filip...@trivago.com> wrote: > I Aggree completely. > > Exposing this information in a place where it has no _natural_ belonging > might really be a bad blocker in the long run. > > Concerning your first point. I would argue its not to hard to have a user > keep track of these. If we still don't want the user > to keep track of these I would argue that all > projection only < > transformations on a Source-backed KTable/KStream > could also return a Ktable/KStream instance of the type we return from the > topology builder. > Only after any operation that exceeds projection or filter one would > return a KTable not granting access to this any longer. > > Even then its difficult already: I never ran a topology with caching but I > am not even 100% sure what the record Context means behind > a materialized KTable with Caching? Topic and Partition are probably with > some reasoning but offset is probably only the offset causing the flush? > So one might aswell think to drop offsets from this RecordContext. > > Best Jan > > > > > > > > On 07.11.2017 03:18, Guozhang Wang wrote: > >> Regarding the API design (the proposed set of overloads v.s. one overload >> on #map to enrich the record), I think what we have represents a good >> trade-off between API succinctness and user convenience: on one hand we >> definitely want to keep as fewer overloaded functions as possible. But on >> the other hand if we only do that in, say, the #map() function then this >> enrichment could be an overkill: think of a topology that has 7 operators >> in a chain, where users want to access the record context on operator #2 >> and #6 only, with the "enrichment" manner they need to do the enrichment >> on >> operator #2 and keep it that way until #6. In addition, the RecordContext >> fields (topic, offset, etc) are really orthogonal to the key-value >> payloads >> themselves, so I think separating them into this object is a cleaner way. >> >> Regarding the RecordContext inheritance, this is actually a good point >> that >> have not been discussed thoroughly before. Here are my my two cents: one >> natural way would be to inherit the record context from the "triggering" >> record, for example in a join operator, if the record from stream A >> triggers the join then the record context is inherited from with that >> record. This is also aligned with the lower-level PAPI interface. A >> counter >> argument, though, would be that this is sort of leaking the internal >> implementations of the DSL, so that moving forward if we did some >> refactoring to our join implementations so that the triggering record can >> change, the RecordContext would also be different. I do not know how much >> it would really affect end users, but would like to hear your opinions. >> > Agreed to 100% exposing this information > > >> >> Guozhang >> >> >> On Mon, Nov 6, 2017 at 1:00 PM, Jeyhun Karimov <je.kari...@gmail.com> >> wrote: >> >> Hi Jan, >>> >>> Sorry for late reply. >>> >>> >>> The API Design doesn't look appealing >>> >>> >>> In terms of API design we tried to preserve the java functional >>> interfaces. >>> We applied the same set of rich methods for KTable to make it compatible >>> with the rest of overloaded APIs. >>> >>> It should be 100% sufficient to offer a KTable + KStream that is directly >>> >>>> feed from a topic with 1 additional overload for the #map() methods to >>>> cover every usecase while keeping the API in a way better state. >>>> >>> >>> - IMO this seems a workaround, rather than a direct solution. >>> >>> Perhaps we should continue this discussion in DISCUSS thread. >>> >>> >>> Cheers, >>> Jeyhun >>> >>> >>> On Mon, Nov 6, 2017 at 9:14 PM Jan Filipiak <jan.filip...@trivago.com> >>> wrote: >>> >>> Hi. >>>> >>>> I do understand that it might come in Handy. >>>> From my POV in any relational algebra this is only a projection. >>>> Currently we hide these "fields" that come with the input record. >>>> It should be 100% sufficient to offer a KTable + KStream that is >>>> directly >>>> feed from a topic with 1 additional overload for the #map() methods to >>>> cover every usecase while keeping the API in a way better state. >>>> >>>> best Jan >>>> >>>> On 06.11.2017 17:52, Matthias J. Sax wrote: >>>> >>>>> Jan, >>>>> >>>>> I understand what you are saying. However, having a RecordContext is >>>>> super useful for operations that are applied to input topic. Many users >>>>> requested this feature -- it's much more convenient that falling back >>>>> >>>> to >>> >>>> transform() to implement a a filter() for example that want to access >>>>> some meta data. >>>>> >>>>> Because we cannot distinguish different "origins" of a KStream/KTable, >>>>> >>>> I >>> >>>> am not sure if there would be a better way to do this. The only >>>>> "workaround" I see, is to have two KStream/KTable interfaces each and >>>>> >>>> we >>> >>>> would use the first one for KStream/KTable with "proper" RecordContext. >>>>> But this does not seem to be a good solution either. >>>>> >>>>> Note, a KTable can also be read directly from a topic, I agree that >>>>> using RecordContext on a KTable that is the result of an aggregation is >>>>> questionable. But I don't see a reason to down vote the KIP for this >>>>> >>>> reason. >>>> >>>>> WDYT about this? >>>>> >>>>> >>>>> -Matthias >>>>> >>>>> On 11/1/17 10:19 PM, Jan Filipiak wrote: >>>>> >>>>>> -1 non binding >>>>>> >>>>>> I don't get the motivation. >>>>>> In 80% of my DSL processors there is no such thing as a reasonable >>>>>> RecordContext. >>>>>> After a join the record I am processing belongs to at least 2 topics. >>>>>> After a Group by the record I am processing was created from multiple >>>>>> offsets. >>>>>> >>>>>> The API Design doesn't look appealing >>>>>> >>>>>> Best Jan >>>>>> >>>>>> >>>>>> >>>>>> On 01.11.2017 22:02, Jeyhun Karimov wrote: >>>>>> >>>>>>> Dear community, >>>>>>> >>>>>>> It seems the discussion for KIP-159 [1] converged finally. I would >>>>>>> like to >>>>>>> initiate voting for the particular KIP. >>>>>>> >>>>>>> >>>>>>> >>>>>>> [1] >>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- >>>>>>> >>>>>> 159%3A+Introducing+Rich+functions+to+Streams >>>> >>>>> >>>>>>> Cheers, >>>>>>> Jeyhun >>>>>>> >>>>>>> >>>> >> >> > -- -- Guozhang