With KIP-159, the added "valueMapperWithKey" and "valueTransformerWithKey" along seem to be just adding a few syntax-sugars? E.g. we can simply implement:
mapValues(ValueMapperWithKey mapperWithKey) as map((k, v) -> (k, mapperWithKey(k, v)) ---------------------- I'm not sure how many users would really need such syntax sugars, and even they do, we can support them easily as the above implementations; Similarly for "ReducerWithKey", it can be implemented as `Aggregator<K, V, V>`, and people who needs key access can just use `aggregate` directly. The function which I think is really of valuable is `ValueJoinerWithKey`, and that seems to be the original request from users while others are coming from "API consistency" concerns. Personally I'd prefer only keep the last one (`InitializerWithKey` might also have some value, but I have not seen people widely requesting it in their DSL usage yet; if there is a common request we can keep it in this KIP as well). WDYT? Guozhang On Sun, May 28, 2017 at 10:16 AM, Jeyhun Karimov <je.kari...@gmail.com> wrote: > Thanks for clarification Matthias, now everything is clear. > > On Sun, May 28, 2017 at 6:21 PM Matthias J. Sax <matth...@confluent.io> > wrote: > > > I don't think we can drop ValueTransformerSupplier. If you don't have an > > supplier, you only get a single instance of your function. But for a > > stateful transformation, we need multiple instances (one for each task) > > of ValueTransformer. > > > > We don't need suppliers for functions like "ValueMapper" etc because > > those are stateless and thus, we can reuse a single instance over > > multiple tasks -- but we can't do this for ValueTransformer (and > similar). > > > > Btw: This reminds me about KIP-159: with regard to the RichFunction we > > might need a supplier pattern, too. (I'll comment on the other thread, > > too.) > > > > > > -Matthias > > > > On 5/28/17 5:45 AM, Jeyhun Karimov wrote: > > > Hi, > > > > > > I updated KIP. > > > Just to avoid misunderstanding, I meant deprecating > > ValueTransformerSupplier > > > and I am ok with ValueTransformer. > > > So instead of using ValueTransformerSupplier can't we directly use > > > ValueTransformer > > > or ValueTransformerWithKey? > > > > > > Btw, in current design all features of ValueTransformer will be > available > > > in ValueTransformerWithKey interface. > > > > > > Cheers, > > > Jeyhun > > > > > > On Sun, May 28, 2017 at 6:15 AM Matthias J. Sax <matth...@confluent.io > > > > > wrote: > > > > > >> Thanks Jeyhun. > > >> > > >> About ValueTransformer: I don't think we can remove it. Note, > > >> ValueTransformer allows to attach a state and also allows to register > > >> punctuations. Both those features will not be available via withKey() > > >> interfaces. > > >> > > >> -Matthias > > >> > > >> > > >> On 5/27/17 1:25 PM, Jeyhun Karimov wrote: > > >>> Hi Matthias, > > >>> > > >>> Thanks for your comments. > > >>> > > >>> I tested the deep copy approach. It has significant overhead. > > Especially > > >>> for "light" and stateless operators it slows down the topology > > >>> significantly (> 20% ). I think "warning" users about not-changing > the > > >> key > > >>> is better warning them about possible performance loss. > > >>> > > >>> About the interfaces, additionally I considered adding > > >> InitializerWithKey, > > >>> AggregatorWithKey and ValueTransformerWithKey. I think they are > > included > > >> in > > >>> PR but not in KIP. I will also include them in KIP, sorry my bad. > > >>> Including ReducerWithKey definitely makes sense. Thanks. > > >>> > > >>> One thing I want to mention is that, maybe we should deprecate > methods > > >> with > > >>> argument type ValueTransformerSupplier (KStream.transformValues(...)) > > and > > >>> and as a whole the ValueTransformerSupplier interface. > > >>> We can use ValueTransformer/ValueTransformerWithKey type instead > > without > > >>> additional supplier layer. > > >>> > > >>> > > >>> Cheers, > > >>> Jeyhun > > >>> > > >>> > > >>> On Thu, May 25, 2017 at 1:07 AM Matthias J. Sax < > matth...@confluent.io > > > > > >>> wrote: > > >>> > > >>>> One more question: > > >>>> > > >>>> Should we add any of > > >>>> - InitizialierWithKey > > >>>> - ReducerWithKey > > >>>> - ValueTransformerWithKey > > >>>> > > >>>> To get consistent/complete API, it might be a good idea. Any > thoughts? > > >>>> > > >>>> > > >>>> -Matthias > > >>>> > > >>>> > > >>>> On 5/24/17 3:47 PM, Matthias J. Sax wrote: > > >>>>> Jeyhun, > > >>>>> > > >>>>> I was just wondering if you did look into the key-deep-copy idea we > > >>>>> discussed. I am curious to see what the impact might be. > > >>>>> > > >>>>> > > >>>>> -Matthias > > >>>>> > > >>>>> On 5/20/17 2:03 AM, Jeyhun Karimov wrote: > > >>>>>> Hi, > > >>>>>> > > >>>>>> Thanks for your comments. I rethink about including rich functions > > >> into > > >>>>>> this KIP. > > >>>>>> I think once we include rich functions in this KIP and then fix > > >>>>>> ProcessorContext in another KIP and incorporate with existing rich > > >>>>>> functions, the code will not be backwards compatible. > > >>>>>> > > >>>>>> I see Damian's and your point more clearly now. > > >>>>>> > > >>>>>> Conclusion: we include only withKey interfaces in this KIP (I > > updated > > >>>> the > > >>>>>> KIP), I will try also initiate another KIP for rich functions. > > >>>>>> > > >>>>>> Cheers, > > >>>>>> Jeyhun > > >>>>>> > > >>>>>> On Fri, May 19, 2017 at 10:50 PM Matthias J. Sax < > > >> matth...@confluent.io > > >>>>> > > >>>>>> wrote: > > >>>>>> > > >>>>>>> With the current KIP, using ValueMapper and ValueMapperWithKey > > >>>>>>> interfaces, RichFunction seems to be an independent add-on. To > fix > > >> the > > >>>>>>> original issue to allow key access, RichFunctions are not > required > > >>>> IMHO. > > >>>>>>> > > >>>>>>> I initially put the RichFunction idea on the table, because I was > > >>>> hoping > > >>>>>>> to get a uniform API. And I think, is was good to consider them > -- > > >>>>>>> however, the discussion showed that they are not necessary for > key > > >>>>>>> access. Thus, it seems to be better to handle RichFunctions in an > > own > > >>>>>>> KIP. The ProcessorContext/RecordContext issues seems to be a main > > >>>>>>> challenge for this. And introducing RichFunctions with > > parameter-less > > >>>>>>> init() method, seem not to help too much. We would get an > > >>>> "intermediate" > > >>>>>>> API that we want to change anyway later on... > > >>>>>>> > > >>>>>>> As you put already much effort into RichFunction, feel free do > push > > >>>> this > > >>>>>>> further and start a new KIP (we can do this even in parallel) -- > we > > >>>>>>> don't want to slow you down :) But it make the discussion and > code > > >>>>>>> review easier, if we separate both IMHO. > > >>>>>>> > > >>>>>>> > > >>>>>>> -Matthias > > >>>>>>> > > >>>>>>> > > >>>>>>> On 5/19/17 2:25 AM, Jeyhun Karimov wrote: > > >>>>>>>> Hi Damian, > > >>>>>>>> > > >>>>>>>> Thanks for your comments. I think providing to users *interface* > > >>>> rather > > >>>>>>>> than *abstract class* should be preferred (Matthias also raised > > this > > >>>>>>> issue > > >>>>>>>> ), anyway I changed the corresponding parts of KIP. > > >>>>>>>> > > >>>>>>>> Regarding with passing additional contextual information, I > think > > it > > >>>> is a > > >>>>>>>> tradeoff, > > >>>>>>>> 1) first, we fix the context parameter for *init() *method in > > >> another > > >>>> PR > > >>>>>>>> and solve Rich functions afterwards > > >>>>>>>> 2) first, we fix the requested issues on jira ([1-3]) with > > providing > > >>>>>>>> (not-complete) Rich functions and integrate the context > parameters > > >> to > > >>>>>>> this > > >>>>>>>> afterwards (like improvement) > > >>>>>>>> > > >>>>>>>> To me, the second approach seems more incremental. However you > are > > >>>> right, > > >>>>>>>> the names might confuse the users. > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> [1] https://issues.apache.org/jira/browse/KAFKA-4218 > > >>>>>>>> [2] https://issues.apache.org/jira/browse/KAFKA-4726 > > >>>>>>>> [3] https://issues.apache.org/jira/browse/KAFKA-3745 > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> Cheers, > > >>>>>>>> Jeyhun > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> On Fri, May 19, 2017 at 10:42 AM Damian Guy < > damian....@gmail.com > > > > > >>>>>>> wrote: > > >>>>>>>> > > >>>>>>>>> Hi, > > >>>>>>>>> > > >>>>>>>>> I see you've removed the `ProcessorContext` from the > RichFunction > > >>>> which > > >>>>>>> is > > >>>>>>>>> good, but why is it a `RichFunction`? I'd have expected it to > > pass > > >>>> some > > >>>>>>>>> additional contextual information, i.e., the `RecordContext` > that > > >>>>>>> contains > > >>>>>>>>> just the topic, partition, timestamp, offset. I'm ok with it > not > > >>>>>>> passing > > >>>>>>>>> this contextual information, but is the naming incorrect? I'm > not > > >>>> sure, > > >>>>>>>>> tbh. I'm wondering if we should drop `RichFunctions` until we > can > > >> do > > >>>> it > > >>>>>>>>> properly with the correct context? > > >>>>>>>>> > > >>>>>>>>> Also, i don't like the abstract classes: RichValueMapper, > > >>>>>>> RichValueJoiner, > > >>>>>>>>> RichInitializer etc. Why can't they not just be interfaces? > > >>>> Generally we > > >>>>>>>>> should provide users with Intefaces to code against, not > classes. > > >>>>>>>>> > > >>>>>>>>> Thanks, > > >>>>>>>>> Damian > > >>>>>>>>> > > >>>>>>>>> On Fri, 19 May 2017 at 00:50 Jeyhun Karimov < > > je.kari...@gmail.com> > > >>>>>>> wrote: > > >>>>>>>>> > > >>>>>>>>>> Hi, > > >>>>>>>>>> > > >>>>>>>>>> Thanks. I initiated the PR as well, to get a better overview. > > >>>>>>>>>> > > >>>>>>>>>> The only reason that I used abstract class instead of > interface > > >> for > > >>>>>>> Rich > > >>>>>>>>>> functions is that in future if we will have some > > >>>> AbstractRichFunction > > >>>>>>>>>> abstract classes, > > >>>>>>>>>> we can easily extend: > > >>>>>>>>>> > > >>>>>>>>>> public abstract class RichValueMapper<K, V, VR> implements > > >>>>>>>>>> ValueMapperWithKey<K, V, VR>, RichFunction *extends > > >>>>>>>>> AbstractRichFunction*{ > > >>>>>>>>>> } > > >>>>>>>>>> With interfaces we are only limited to interfaces for > > >> inheritance. > > >>>>>>>>>> > > >>>>>>>>>> However, I think we can easily change it (from abstract class > -> > > >>>>>>>>> interface) > > >>>>>>>>>> if you think interface is a better fit. > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> Cheers, > > >>>>>>>>>> Jeyhun > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> On Fri, May 19, 2017 at 1:00 AM Matthias J. Sax < > > >>>> matth...@confluent.io > > >>>>>>>> > > >>>>>>>>>> wrote: > > >>>>>>>>>> > > >>>>>>>>>>> Thanks for the update and explanations. The KIP is quite > > improved > > >>>> now > > >>>>>>>>> -- > > >>>>>>>>>>> great job! > > >>>>>>>>>>> > > >>>>>>>>>>> One more question: Why are RichValueMapper etc abstract > classes > > >> and > > >>>>>>> not > > >>>>>>>>>>> interfaces? > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> Overall, I like the KIP a lot! > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> -Matthias > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> On 5/16/17 7:03 AM, Jeyhun Karimov wrote: > > >>>>>>>>>>>> Hi, > > >>>>>>>>>>>> > > >>>>>>>>>>>> Thanks for your comments. > > >>>>>>>>>>>> > > >>>>>>>>>>>> I think supporting Lambdas for `withKey` and > > >>>> `AbstractRichFunction` > > >>>>>>>>>>>>> don't go together, as Lambdas are only supported for > > interfaces > > >>>>>>>>> AFAIK. > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> Maybe I misunderstood your comment. > > >>>>>>>>>>>> *withKey* and and *withOnlyValue* are interfaces. So they > > don't > > >>>> have > > >>>>>>>>>>> direct > > >>>>>>>>>>>> relation with *AbstractRichFunction*. > > >>>>>>>>>>>> *withKey* and and *withOnlyValue* interfaces have only one > > >>>> method , > > >>>>>>>>> so > > >>>>>>>>>>> we > > >>>>>>>>>>>> can use lambdas. > > >>>>>>>>>>>> Where does the *AbstractRichFunction* comes to play? Inside > > Rich > > >>>>>>>>>>> functions. > > >>>>>>>>>>>> And we use Rich functions in 2 places: > > >>>>>>>>>>>> > > >>>>>>>>>>>> 1. User doesn't use rich functions. Just regular *withKey* > and > > >> and > > >>>>>>>>>>>> *withOnlyValue* interfaces(both support lambdas) . We get > > those > > >>>>>>>>>>> interfaces > > >>>>>>>>>>>> and wrap into Rich function while building the topology, and > > >> send > > >>>> to > > >>>>>>>>>>>> Processor. > > >>>>>>>>>>>> 2. User does use rich functions (Rich functions implement > > >>>> *withKey* > > >>>>>>>>>>>> interface). As a result no lamdas here by definition. In > this > > >>>> case, > > >>>>>>>>>> while > > >>>>>>>>>>>> building the topology we do a type check if the object type > is > > >>>>>>>>>>>> *withKey* or *RichFunction*. > > >>>>>>>>>>>> > > >>>>>>>>>>>> So *AbstractRichFunction* is just syntactic sugar for Rich > > >>>> functions > > >>>>>>>>>> and > > >>>>>>>>>>>> does not affect using lambdas. > > >>>>>>>>>>>> > > >>>>>>>>>>>> Thus, if we want to support Lambdas for `withKey`, we need > to > > >>>> have a > > >>>>>>>>>>>>> interface approach like this > > >>>>>>>>>>>>> - RichFunction -> only adding init() and close() > > >>>>>>>>>>>>> - ValueMapper > > >>>>>>>>>>>>> - ValueMapperWithKey > > >>>>>>>>>>>>> - RichValueMapper extends ValueMapperWithKey, > RichFunction > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> As I said above, currently we support lambdas for *withKey* > > >>>>>>>>> interfaces > > >>>>>>>>>> as > > >>>>>>>>>>>> well. However, I agree with your idea and I will remove the > > >>>>>>>>>>>> AbstractRichFunction from the design. > > >>>>>>>>>>>> > > >>>>>>>>>>>> As an alternative, we could argue, that it is sufficient to > > >>>> support > > >>>>>>>>>>>>> Lambdas for the "plain" API only, but not for any "extended > > >> API". > > >>>>>>>>> For > > >>>>>>>>>>>>> this, RichFunction could add key+init+close and > > >>>> AbstractRichFunction > > >>>>>>>>>>>>> would allow to only care about getting the key. > > >>>>>>>>>>>>> Not sure, which one is better. I don't like the idea of > more > > >>>>>>>>>> overloaded > > >>>>>>>>>>>>> methods to get Lambdas for `withKey` interfaces too much > > >> because > > >>>> we > > >>>>>>>>>> have > > >>>>>>>>>>>>> already so many overlaods. On the other hand, I do see > value > > in > > >>>>>>>>>>>>> supporting Lambdas for `withKey`. > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> Just to clarify, with current design we have only one extra > > >>>>>>>>> overloaded > > >>>>>>>>>>>> method per *withOnlyValue* interface: which is *withKey* > > >> version > > >>>> of > > >>>>>>>>>>>> particular interface. > > >>>>>>>>>>>> We don't need extra overload for Rich function as Rich > > function > > >>>>>>>>>>> implements > > >>>>>>>>>>>> *withKey* interface as a result they have same type. We > > >>>> differentiate > > >>>>>>>>>>> them > > >>>>>>>>>>>> while building the topology. > > >>>>>>>>>>>> We supported lambdas for *withKey* APIs because of the > > comment: > > >>>>>>>>>>>> > > >>>>>>>>>>>> @Jeyhun: I did not put any thought into this, but can we > have > > a > > >>>>>>>>> design > > >>>>>>>>>>>>> that allows for both? Also, with regard to lambdas, it > might > > >> make > > >>>>>>>>>> sense > > >>>>>>>>>>>>> to allow for both `V -> newV` and `(K, V) -> newV` ? > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> However, I don't think that this complicates the overall > > design > > >>>>>>>>>>>> significantly. > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> Depending on what we want to support, it might make sense to > > >>>>>>>>>>>>> include/exclude RichFunctions from this KIP -- and thus, > this > > >>>> also > > >>>>>>>>>>>>> determines if we should have a "ProcessorContext KIP" > before > > >>>> driving > > >>>>>>>>>>>>> this KIP further. > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> Based on our discussion I think we should keep Rich > functions > > >> as I > > >>>>>>>>>> don't > > >>>>>>>>>>>> think that they bring extra layer of overhead to library. > > >>>>>>>>>>>> > > >>>>>>>>>>>> Any comments are appreciated. > > >>>>>>>>>>>> > > >>>>>>>>>>>> Cheers, > > >>>>>>>>>>>> Jeyhun > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> On Tue, May 16, 2017 at 12:10 AM Matthias J. Sax < > > >>>>>>>>>> matth...@confluent.io> > > >>>>>>>>>>>> wrote: > > >>>>>>>>>>>> > > >>>>>>>>>>>>> Jeyhun, > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> thanks for the update. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> I think supporting Lambdas for `withKey` and > > >>>> `AbstractRichFunction` > > >>>>>>>>>>>>> don't go together, as Lambdas are only supported for > > interfaces > > >>>>>>>>> AFAIK. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Thus, if we want to support Lambdas for `withKey`, we need > to > > >>>> have a > > >>>>>>>>>>>>> interface approach like this > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> - RichFunction -> only adding init() and close() > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> - ValueMapper > > >>>>>>>>>>>>> - ValueMapperWithKey > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> - RichValueMapper extends ValueMapperWithKey, > RichFunction > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> For this approach, AbstractRichFunction does not make sense > > >>>> anymore, > > >>>>>>>>>> as > > >>>>>>>>>>>>> the only purpose of `RichFunction` is to allow the > > >>>> implementation of > > >>>>>>>>>>>>> init() and close() -- if you don't want those, you would > > >>>> implement a > > >>>>>>>>>>>>> different interface (ie, ValueMapperWithKey) > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> As an alternative, we could argue, that it is sufficient to > > >>>> support > > >>>>>>>>>>>>> Lambdas for the "plain" API only, but not for any "extended > > >> API". > > >>>>>>>>> For > > >>>>>>>>>>>>> this, RichFunction could add key+init+close and > > >>>> AbstractRichFunction > > >>>>>>>>>>>>> would allow to only care about getting the key. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Not sure, which one is better. I don't like the idea of > more > > >>>>>>>>>> overloaded > > >>>>>>>>>>>>> methods to get Lambdas for `withKey` interfaces too much > > >> because > > >>>> we > > >>>>>>>>>> have > > >>>>>>>>>>>>> already so many overlaods. On the other hand, I do see > value > > in > > >>>>>>>>>>>>> supporting Lambdas for `withKey`. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Depending on what we want to support, it might make sense > to > > >>>>>>>>>>>>> include/exclude RichFunctions from this KIP -- and thus, > this > > >>>> also > > >>>>>>>>>>>>> determines if we should have a "ProcessorContext KIP" > before > > >>>> driving > > >>>>>>>>>>>>> this KIP further. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Thoughts? > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> -Matthias > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> On 5/15/17 11:01 AM, Jeyhun Karimov wrote: > > >>>>>>>>>>>>>> Hi, > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Sorry for super late response. Thanks for your comments. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> I am not an expert on Lambdas. Can you elaborate a little > > >> bit? I > > >>>>>>>>>> cannot > > >>>>>>>>>>>>>>> follow the explanation in the KIP to see what the problem > > is. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> - From [1] says "A functional interface is an interface > that > > >> has > > >>>>>>>>> just > > >>>>>>>>>>> one > > >>>>>>>>>>>>>> abstract method, and thus represents a single function > > >>>> contract". > > >>>>>>>>>>>>>> So basically once we extend some interface from another > (in > > >> our > > >>>>>>>>> case, > > >>>>>>>>>>>>>> ValueMapperWithKey from ValueMapper) we cannot use lambdas > > in > > >>>> the > > >>>>>>>>>>>>> extended > > >>>>>>>>>>>>>> interface. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Further comments: > > >>>>>>>>>>>>>>> - The KIP get a little hard to read -- can you maybe > > >> reformat > > >>>> the > > >>>>>>>>>>> wiki > > >>>>>>>>>>>>>>> page a little bit? I think using `CodeBlock` would help. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> - I will work on the KIP. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> - What about KStream-KTable joins? You don't have > overlaods > > >>>> added > > >>>>>>>>>> for > > >>>>>>>>>>>>>>> them. Why? (Even if I still hope that we don't need to > add > > >> any > > >>>> new > > >>>>>>>>>>>>>>> overloads) > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> - Actually there are more than one Processor and public > APIs > > >> to > > >>>> be > > >>>>>>>>>>>>>> changed (KStream-KTable > > >>>>>>>>>>>>>> joins is one case). However all of them has similar > > structure: > > >>>> we > > >>>>>>>>>>>>> overload > > >>>>>>>>>>>>>> the *method* with *methodWithKey*, > > >>>>>>>>>>>>>> wrap it into the Rich function, send to processor and > inside > > >> the > > >>>>>>>>>>>>> processor > > >>>>>>>>>>>>>> call *init* and *close* methods of the Rich function. > > >>>>>>>>>>>>>> As I wrote in KIP, I wanted to demonstrate the overall > idea > > >> with > > >>>>>>>>> only > > >>>>>>>>>>>>>> *ValueMapper* as the same can be applied to all changes. > > >>>>>>>>>>>>>> Anyway I will update the KIP. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> - Why do we need `AbstractRichFunction`? > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Instead of overriding the *init(ProcessorContext p)* and* > > >>>> close()* > > >>>>>>>>>>>>> methods > > >>>>>>>>>>>>>> in every Rich function with empty body like: > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> @Override > > >>>>>>>>>>>>>> void init(ProcessorContext context) {} > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> @Override > > >>>>>>>>>>>>>> void close () {} > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> I thought that we can override them once in > > >>>> *AbstractRichFunction* > > >>>>>>>>>> and > > >>>>>>>>>>>>>> extent new Rich functions from *AbstractRichFunction*. > > >>>>>>>>>>>>>> Basically this can eliminate code copy-paste and ease the > > >>>>>>>>>> maintenance. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> - What about interfaces Initializer, ForeachAction, > Merger, > > >>>>>>>>>> Predicate, > > >>>>>>>>>>>>>>> Reducer? I don't want to say we should/need to add to > all, > > >> but > > >>>> we > > >>>>>>>>>>> should > > >>>>>>>>>>>>>>> discuss all of them and add where it does make sense > (e.g., > > >>>>>>>>>>>>>>> RichForachAction does make sense IMHO) > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Definitely agree. As I said, the same technique applies to > > all > > >>>> this > > >>>>>>>>>>>>>> interfaces and I didn't want to explode the KIP, just > wanted > > >> to > > >>>>>>>>> give > > >>>>>>>>>>> the > > >>>>>>>>>>>>>> overall intuition. > > >>>>>>>>>>>>>> However, I will update the KIP as I said. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Btw: I like the hierarchy `ValueXX` -- `ValueXXWithKey` -- > > >>>>>>>>>>> `RichValueXX` > > >>>>>>>>>>>>>>> in general -- but why can't we do all this with > interfaces > > >>>> only? > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Sure we can. However the main intuition is we should not > > force > > >>>>>>>>> users > > >>>>>>>>>> to > > >>>>>>>>>>>>>> implement *init(ProcessorContext)* and *close()* functions > > >> every > > >>>>>>>>> time > > >>>>>>>>>>>>> they > > >>>>>>>>>>>>>> use Rich functions. > > >>>>>>>>>>>>>> If one needs, she can override the respective methods. > > >> However, > > >>>> I > > >>>>>>>>> am > > >>>>>>>>>>> open > > >>>>>>>>>>>>>> for discussion. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> I'd rather not see the use of `ProcessorContext` spread > any > > >>>>>>>>> further > > >>>>>>>>>>> than > > >>>>>>>>>>>>>>> it currently is. So maybe we need another KIP that is > done > > >>>> before > > >>>>>>>>>>> this? > > >>>>>>>>>>>>>>> Otherwise i think the scope of this KIP is becoming too > > >> large. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> That is good point. I wanted to make > > *init(ProcessorContext)* > > >>>>>>>>> method > > >>>>>>>>>>>>>> persistent among the library (which use ProcessorContext > as > > an > > >>>>>>>>>> input), > > >>>>>>>>>>>>>> therefore I put *ProcessorContext* as an input. > > >>>>>>>>>>>>>> So the important question is that (as @dguy and @mjsax > > >>>> mentioned) > > >>>>>>>>>>> whether > > >>>>>>>>>>>>>> continue this KIP without providing users an access to > > >>>>>>>>>>> *ProcessorContext* > > >>>>>>>>>>>>>> (change *init (ProcessorContext)* to * init()* ) or > > >>>>>>>>>>>>>> initiate another KIP before this. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> [1] > > >>>>>>>>>>>>>> > > http://cr.openjdk.java.net/~mr/se/8/java-se-8-pfd-spec/java- > > >>>>>>>>>>>>> se-8-jls-pfd-diffs.pdf > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Cheers, > > >>>>>>>>>>>>>> Jeyhun > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> On Mon, May 15, 2017 at 7:15 PM, Damian Guy < > > >>>> damian....@gmail.com> > > >>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> I'd rather not see the use of `ProcessorContext` spread > > any > > >>>>>>>>> further > > >>>>>>>>>>>>> than > > >>>>>>>>>>>>>>> it currently is. So maybe we need another KIP that is > done > > >>>> before > > >>>>>>>>>>> this? > > >>>>>>>>>>>>>>> Otherwise i think the scope of this KIP is becoming too > > >> large. > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> On Mon, 15 May 2017 at 18:06 Matthias J. Sax < > > >>>>>>>>> matth...@confluent.io > > >>>>>>>>>>> > > >>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> I agree that that `ProcessorContext` interface is too > > broad > > >> in > > >>>>>>>>>>> general > > >>>>>>>>>>>>>>>> -- this is even true for transform/process, and it's > also > > >>>>>>>>> reflected > > >>>>>>>>>>> in > > >>>>>>>>>>>>>>>> the API improvement list we want to do. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/ > > >>>>>>>>>>>>>>> Kafka+Streams+Discussions > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> So I am wondering, if you question the `RichFunction` > > >>>> approach in > > >>>>>>>>>>>>>>>> general? Or if you suggest to either extend the scope of > > >> this > > >>>> KIP > > >>>>>>>>>> to > > >>>>>>>>>>>>>>>> include this---or maybe better, do another KIP for it > and > > >>>> delay > > >>>>>>>>>> this > > >>>>>>>>>>>>> KIP > > >>>>>>>>>>>>>>>> until the other one is done? > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> -Matthias > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> On 5/15/17 2:35 AM, Damian Guy wrote: > > >>>>>>>>>>>>>>>>> Thanks for the KIP. > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> I'm not convinced on the `RichFunction` approach. Do we > > >>>> really > > >>>>>>>>>> want > > >>>>>>>>>>> to > > >>>>>>>>>>>>>>>> give > > >>>>>>>>>>>>>>>>> every DSL method access to the `ProcessorContext` ? It > > has > > >> a > > >>>>>>>>> bunch > > >>>>>>>>>>> of > > >>>>>>>>>>>>>>>>> methods on it that seem in-appropriate for some of the > > DSL > > >>>>>>>>>> methods, > > >>>>>>>>>>>>>>> i.e, > > >>>>>>>>>>>>>>>>> `register`, `getStateStore`, `forward`, `schedule` etc. > > It > > >> is > > >>>>>>>>> far > > >>>>>>>>>>> too > > >>>>>>>>>>>>>>>>> broad. I think it would be better to have a narrower > > >>>> interface > > >>>>>>>>>> like > > >>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>> `RecordContext` - remembering it is easier to add > > >>>>>>>>>>> methods/interfaces > > >>>>>>>>>>>>>>>> later > > >>>>>>>>>>>>>>>>> than to remove them > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> On Sat, 13 May 2017 at 22:26 Matthias J. Sax < > > >>>>>>>>>> matth...@confluent.io > > >>>>>>>>>>>> > > >>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Jeyhun, > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> I am not an expert on Lambdas. Can you elaborate a > > little > > >>>> bit? > > >>>>>>>>> I > > >>>>>>>>>>>>>>> cannot > > >>>>>>>>>>>>>>>>>> follow the explanation in the KIP to see what the > > problem > > >>>> is. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> For updating the KIP title I don't know -- guess it's > up > > >> to > > >>>>>>>>> you. > > >>>>>>>>>>>>>>> Maybe a > > >>>>>>>>>>>>>>>>>> committer can comment on this? > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Further comments: > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> - The KIP get a little hard to read -- can you maybe > > >>>> reformat > > >>>>>>>>>> the > > >>>>>>>>>>>>>>> wiki > > >>>>>>>>>>>>>>>>>> page a little bit? I think using `CodeBlock` would > help. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> - What about KStream-KTable joins? You don't have > > >> overlaods > > >>>>>>>>>> added > > >>>>>>>>>>>>> for > > >>>>>>>>>>>>>>>>>> them. Why? (Even if I still hope that we don't need to > > add > > >>>> any > > >>>>>>>>>> new > > >>>>>>>>>>>>>>>>>> overloads) > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> - Why do we need `AbstractRichFunction`? > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> - What about interfaces Initializer, ForeachAction, > > >> Merger, > > >>>>>>>>>>>>>>> Predicate, > > >>>>>>>>>>>>>>>>>> Reducer? I don't want to say we should/need to add to > > all, > > >>>> but > > >>>>>>>>> we > > >>>>>>>>>>>>>>> should > > >>>>>>>>>>>>>>>>>> discuss all of them and add where it does make sense > > >> (e.g., > > >>>>>>>>>>>>>>>>>> RichForachAction does make sense IMHO) > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Btw: I like the hierarchy `ValueXX` -- > `ValueXXWithKey` > > -- > > >>>>>>>>>>>>>>> `RichValueXX` > > >>>>>>>>>>>>>>>>>> in general -- but why can't we do all this with > > interfaces > > >>>>>>>>> only? > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> -Matthias > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> On 5/11/17 7:00 AM, Jeyhun Karimov wrote: > > >>>>>>>>>>>>>>>>>>> Hi, > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> Thanks for your comments. I think we cannot extend > the > > >> two > > >>>>>>>>>>>>> interfaces > > >>>>>>>>>>>>>>>> if > > >>>>>>>>>>>>>>>>>> we > > >>>>>>>>>>>>>>>>>>> want to keep lambdas. I updated the KIP [1]. Maybe I > > >> should > > >>>>>>>>>> change > > >>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>> title, because now we are not limiting the KIP to > only > > >>>>>>>>>>> ValueMapper, > > >>>>>>>>>>>>>>>>>>> ValueTransformer and ValueJoiner. > > >>>>>>>>>>>>>>>>>>> Please feel free to comment. > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> [1] > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > >>>>>>>>>>>>>>> 149%3A+Enabling+key+access+in+ValueTransformer%2C+ > > >>>>>>>>>>>>>>> ValueMapper%2C+and+ValueJoiner > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> Cheers, > > >>>>>>>>>>>>>>>>>>> Jeyhun > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> On Tue, May 9, 2017 at 7:36 PM Matthias J. Sax < > > >>>>>>>>>>>>>>> matth...@confluent.io> > > >>>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> If `ValueMapperWithKey` extends `ValueMapper` we > don't > > >>>> need > > >>>>>>>>> the > > >>>>>>>>>>> new > > >>>>>>>>>>>>>>>>>>>> overlaod. > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> And yes, we need to do one check -- but this happens > > >> when > > >>>>>>>>>>> building > > >>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>> topology. At runtime (I mean after > KafkaStream#start() > > >> we > > >>>>>>>>> don't > > >>>>>>>>>>>>> need > > >>>>>>>>>>>>>>>> any > > >>>>>>>>>>>>>>>>>>>> check as we will always use `ValueMapperWithKey`) > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> -Matthias > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> On 5/9/17 2:55 AM, Jeyhun Karimov wrote: > > >>>>>>>>>>>>>>>>>>>>> Hi, > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> Thanks for feedback. > > >>>>>>>>>>>>>>>>>>>>> Then we need to overload method > > >>>>>>>>>>>>>>>>>>>>> <VR> KStream<K, VR> mapValues(ValueMapper<? super > > V, > > >> ? > > >>>>>>>>>> extends > > >>>>>>>>>>>>>>> VR> > > >>>>>>>>>>>>>>>>>>>>> mapper); > > >>>>>>>>>>>>>>>>>>>>> with > > >>>>>>>>>>>>>>>>>>>>> <VR> KStream<K, VR> > mapValues(ValueMapperWithKey<? > > >>>> super > > >>>>>>>>> V, > > >>>>>>>>>> ? > > >>>>>>>>>>>>>>>> extends > > >>>>>>>>>>>>>>>>>>>> VR> > > >>>>>>>>>>>>>>>>>>>>> mapper); > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> and in runtime (inside processor) we still have to > > >> check > > >>>> it > > >>>>>>>>> is > > >>>>>>>>>>>>>>>>>>>> ValueMapper > > >>>>>>>>>>>>>>>>>>>>> or ValueMapperWithKey before wrapping it into the > > rich > > >>>>>>>>>> function. > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> Please correct me if I am wrong. > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> Cheers, > > >>>>>>>>>>>>>>>>>>>>> Jeyhun > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> On Tue, May 9, 2017 at 10:56 AM Michal Borowiecki < > > >>>>>>>>>>>>>>>>>>>>> michal.borowie...@openbet.com> wrote: > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> +1 :) > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> On 08/05/17 23:52, Matthias J. Sax wrote: > > >>>>>>>>>>>>>>>>>>>>>>> Hi, > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> I was reading the updated KIP and I am wondering, > > if > > >> we > > >>>>>>>>>> should > > >>>>>>>>>>>>> do > > >>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>>>>> design a little different. > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> Instead of distinguishing between a RichFunction > > and > > >>>>>>>>>>>>>>>> non-RichFunction > > >>>>>>>>>>>>>>>>>>>> at > > >>>>>>>>>>>>>>>>>>>>>>> runtime level, we would use RichFunctions all the > > >> time. > > >>>>>>>>>> Thus, > > >>>>>>>>>>> on > > >>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>> DSL > > >>>>>>>>>>>>>>>>>>>>>>> entry level, if a user provides a > non-RichFunction, > > >> we > > >>>>>>>>> wrap > > >>>>>>>>>> it > > >>>>>>>>>>>>>>> by a > > >>>>>>>>>>>>>>>>>>>>>>> RichFunction that is fully implemented by > Streams. > > >> This > > >>>>>>>>>>>>>>>> RichFunction > > >>>>>>>>>>>>>>>>>>>>>>> would just forward the call omitting the key: > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> Just to sketch the idea (incomplete code > snippet): > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>> public StreamsRichValueMapper implements > > >>>>>>>>> RichValueMapper() > > >>>>>>>>>> { > > >>>>>>>>>>>>>>>>>>>>>>>> private ValueMapper userProvidedMapper; // > set > > by > > >>>>>>>>>>>>> constructor > > >>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>> public VR apply(final K key, final V1 value1, > > >>>> final V2 > > >>>>>>>>>>>>>>> value2) > > >>>>>>>>>>>>>>>> { > > >>>>>>>>>>>>>>>>>>>>>>>> return userProvidedMapper(value1, > value2); > > >>>>>>>>>>>>>>>>>>>>>>>> } > > >>>>>>>>>>>>>>>>>>>>>>>> } > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> From a performance point of view, I am not sure > if > > >> the > > >>>>>>>>>>>>>>>>>>>>>>> "if(isRichFunction)" including casts etc would > have > > >>>> more > > >>>>>>>>>>>>> overhead > > >>>>>>>>>>>>>>>>>> than > > >>>>>>>>>>>>>>>>>>>>>>> this approach (we would do more nested method > call > > >> for > > >>>>>>>>>>>>>>>>>> non-RichFunction > > >>>>>>>>>>>>>>>>>>>>>>> which should be more common than RichFunctions). > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> This approach should not effect lambdas (or do I > > miss > > >>>>>>>>>>>>> something?) > > >>>>>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>>>>>>>>> might be cleaner, as we could have one more top > > level > > >>>>>>>>>>> interface > > >>>>>>>>>>>>>>>>>>>>>>> `RichFunction` with methods `init()` and > `close()` > > >> and > > >>>>>>>>> also > > >>>>>>>>>>>>>>>>>> interfaces > > >>>>>>>>>>>>>>>>>>>>>>> for `RichValueMapper` etc. (thus, no abstract > > classes > > >>>>>>>>>>> required). > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> Any thoughts? > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> -Matthias > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> On 5/6/17 5:29 PM, Jeyhun Karimov wrote: > > >>>>>>>>>>>>>>>>>>>>>>>> Hi, > > >>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>> Thanks for comments. I extended PR and KIP to > > >> include > > >>>>>>>>> rich > > >>>>>>>>>>>>>>>>>> functions. > > >>>>>>>>>>>>>>>>>>>> I > > >>>>>>>>>>>>>>>>>>>>>>>> will still have to evaluate the cost of deep > > copying > > >>>> of > > >>>>>>>>>> keys. > > >>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>> Cheers, > > >>>>>>>>>>>>>>>>>>>>>>>> Jeyhun > > >>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>> On Fri, May 5, 2017 at 8:02 PM Mathieu Fenniak < > > >>>>>>>>>>>>>>>>>>>>>> mathieu.fenn...@replicon.com> > > >>>>>>>>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>> Hey Matthias, > > >>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>> My opinion would be that documenting the > > >>>> immutability of > > >>>>>>>>>> the > > >>>>>>>>>>>>>>> key > > >>>>>>>>>>>>>>>> is > > >>>>>>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>>>>>>> best approach available. Better than requiring > > the > > >>>> key > > >>>>>>>>> to > > >>>>>>>>>>> be > > >>>>>>>>>>>>>>>>>>>>>> serializable > > >>>>>>>>>>>>>>>>>>>>>>>>> (as with Jeyhun's last pass at the PR), no > > >>>> performance > > >>>>>>>>>> risk. > > >>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>> It'd be different if Java had immutable type > > >>>> constraints > > >>>>>>>>>> of > > >>>>>>>>>>>>>>> some > > >>>>>>>>>>>>>>>>>>>> kind. > > >>>>>>>>>>>>>>>>>>>>>> :-) > > >>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>> Mathieu > > >>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>> On Fri, May 5, 2017 at 11:31 AM, Matthias J. > Sax > > < > > >>>>>>>>>>>>>>>>>>>>>> matth...@confluent.io> > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>> Agreed about RichFunction. If we follow this > > path, > > >>>> it > > >>>>>>>>>>> should > > >>>>>>>>>>>>>>>> cover > > >>>>>>>>>>>>>>>>>>>>>>>>>> all(?) DSL interfaces. > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>> About guarding the key -- I am still not sure > > what > > >>>> to > > >>>>>>>>> do > > >>>>>>>>>>>>> about > > >>>>>>>>>>>>>>>>>> it... > > >>>>>>>>>>>>>>>>>>>>>>>>>> Maybe it might be enough to document it (and > > name > > >>>> the > > >>>>>>>>> key > > >>>>>>>>>>>>>>>>>> parameter > > >>>>>>>>>>>>>>>>>>>>>> like > > >>>>>>>>>>>>>>>>>>>>>>>>>> `readOnlyKey` to make is very clear). > > Ultimately, > > >> I > > >>>>>>>>> would > > >>>>>>>>>>>>>>> prefer > > >>>>>>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>>>>>>>>>>>> guard against any modification, but I have no > > good > > >>>> idea > > >>>>>>>>>> how > > >>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>> do > > >>>>>>>>>>>>>>>>>>>>>> this. > > >>>>>>>>>>>>>>>>>>>>>>>>>> Not sure what others think about the risk of > > >>>> corrupted > > >>>>>>>>>>>>>>>>>> partitioning > > >>>>>>>>>>>>>>>>>>>>>>>>>> (what would be a user error and we could say, > > >> well, > > >>>> bad > > >>>>>>>>>>> luck, > > >>>>>>>>>>>>>>>> you > > >>>>>>>>>>>>>>>>>>>> got > > >>>>>>>>>>>>>>>>>>>>>> a > > >>>>>>>>>>>>>>>>>>>>>>>>>> bug in your code, that's not our fault), vs > deep > > >>>> copy > > >>>>>>>>>> with > > >>>>>>>>>>> a > > >>>>>>>>>>>>>>>>>>>> potential > > >>>>>>>>>>>>>>>>>>>>>>>>>> performance hit (that we can't quantity atm > > >> without > > >>>> any > > >>>>>>>>>>>>>>>>>> performance > > >>>>>>>>>>>>>>>>>>>>>>>>> test). > > >>>>>>>>>>>>>>>>>>>>>>>>>> We do have a performance system test. Maybe > it's > > >>>> worth > > >>>>>>>>>> for > > >>>>>>>>>>>>> you > > >>>>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>>>>>>>> apply > > >>>>>>>>>>>>>>>>>>>>>>>>>> the deep copy strategy and run the test. It's > > very > > >>>>>>>>> basic > > >>>>>>>>>>>>>>>>>> performance > > >>>>>>>>>>>>>>>>>>>>>>>>>> test only, but might give some insight. If you > > >> want > > >>>> to > > >>>>>>>>> do > > >>>>>>>>>>>>>>> this, > > >>>>>>>>>>>>>>>>>> look > > >>>>>>>>>>>>>>>>>>>>>>>>>> into folder "tests" for general test setup, > and > > >> into > > >>>>>>>>>>>>>>>>>>>>>>>>>> "tests/kafaktests/benchmarks/streams" to find > > find > > >>>> the > > >>>>>>>>>> perf > > >>>>>>>>>>>>>>>> test. > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>> -Matthias > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>> On 5/5/17 8:55 AM, Jeyhun Karimov wrote: > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Matthias, > > >>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> I think extending KIP to include > RichFunctions > > >>>> totally > > >>>>>>>>>>>>> makes > > >>>>>>>>>>>>>>>>>>>> sense. > > >>>>>>>>>>>>>>>>>>>>>>>>> So, > > >>>>>>>>>>>>>>>>>>>>>>>>>>> we don't want to guard the keys because it > is > > >>>>>>>>> costly. > > >>>>>>>>>>>>>>>>>>>>>>>>>>> If we introduce RichFunctions I think it > should > > >>>> not be > > >>>>>>>>>>>>>>> limited > > >>>>>>>>>>>>>>>>>> only > > >>>>>>>>>>>>>>>>>>>>>>>>> the 3 > > >>>>>>>>>>>>>>>>>>>>>>>>>>> interfaces proposed in KIP but to wide range > of > > >>>>>>>>>>> interfaces. > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Please correct me if I am wrong. > > >>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers, > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Jeyhun > > >>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, May 5, 2017 at 12:04 AM Matthias J. > > Sax < > > >>>>>>>>>>>>>>>>>>>>>> matth...@confluent.io > > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> One follow up. There was this email on the > > user > > >>>> list: > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>> http://search-hadoop.com/m/Kafka/uyzND17KhCaBzPSZ1?subj= > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>> Shouldn+t+the+initializer+of+a+stream+aggregate+accept+the+ > > >>>>>>>>>>>>>>> key+ > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> It might make sense so include Initializer, > > >> Adder, > > >>>>>>>>> and > > >>>>>>>>>>>>>>>>>> Substractor > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> inferface, too. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> And we should double check if there are > other > > >>>>>>>>> interface > > >>>>>>>>>>> we > > >>>>>>>>>>>>>>>> might > > >>>>>>>>>>>>>>>>>>>>>> miss > > >>>>>>>>>>>>>>>>>>>>>>>>>> atm. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> -Matthias > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> On 5/4/17 1:31 PM, Matthias J. Sax wrote: > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for updating the KIP. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Deep copying the key will work for sure, > but > > I > > >> am > > >>>>>>>>>>> actually > > >>>>>>>>>>>>>>> a > > >>>>>>>>>>>>>>>>>>>> little > > >>>>>>>>>>>>>>>>>>>>>>>>> bit > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> worried about performance impact... We > might > > >>>> want to > > >>>>>>>>>> do > > >>>>>>>>>>>>>>> some > > >>>>>>>>>>>>>>>>>> test > > >>>>>>>>>>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> quantify this impact. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Btw: this remind me about the idea of > > >>>> `RichFunction` > > >>>>>>>>>>>>>>>> interface > > >>>>>>>>>>>>>>>>>>>> that > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> would allow users to access record metadata > > >> (like > > >>>>>>>>>>>>>>> timestamp, > > >>>>>>>>>>>>>>>>>>>>>> offset, > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> partition etc) within DSL. This would be a > > >>>> similar > > >>>>>>>>>>>>> concept. > > >>>>>>>>>>>>>>>>>>>> Thus, I > > >>>>>>>>>>>>>>>>>>>>>>>>> am > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wondering, if it would make sense to > enlarge > > >> the > > >>>>>>>>> scope > > >>>>>>>>>>> of > > >>>>>>>>>>>>>>>> this > > >>>>>>>>>>>>>>>>>>>> KIP > > >>>>>>>>>>>>>>>>>>>>>> by > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> that? WDYT? > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> -Matthias > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 5/3/17 2:08 AM, Jeyhun Karimov wrote: > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Mathieu, > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for feedback. I followed similar > > >> approach > > >>>>>>>>> and > > >>>>>>>>>>>>>>> updated > > >>>>>>>>>>>>>>>>>> PR > > >>>>>>>>>>>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>>>>>>>>>>>> KIP > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> accordingly. I tried to guard the key in > > >>>> Processors > > >>>>>>>>>>>>>>> sending > > >>>>>>>>>>>>>>>> a > > >>>>>>>>>>>>>>>>>>>> copy > > >>>>>>>>>>>>>>>>>>>>>>>>> of > > >>>>>>>>>>>>>>>>>>>>>>>>>> an > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> actual key. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Because I am doing deep copy of an > object, I > > >>>> think > > >>>>>>>>>>> memory > > >>>>>>>>>>>>>>>> can > > >>>>>>>>>>>>>>>>>> be > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> bottleneck > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in some use-cases. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers, > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeyhun > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, May 2, 2017 at 5:10 PM Mathieu > > >> Fenniak < > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> mathieu.fenn...@replicon.com> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jeyhun, > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This approach would change ValueMapper > > >>>> (...etc) to > > >>>>>>>>>> be > > >>>>>>>>>>>>>>>>>> classes, > > >>>>>>>>>>>>>>>>>>>>>>>>> rather > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> than > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interfaces, which is also a backwards > > >>>> incompatible > > >>>>>>>>>>>>>>> change. > > >>>>>>>>>>>>>>>>>> An > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> alternative > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> approach that would be backwards > compatible > > >>>> would > > >>>>>>>>> be > > >>>>>>>>>>> to > > >>>>>>>>>>>>>>>>>> define > > >>>>>>>>>>>>>>>>>>>>>> new > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interfaces, and provide overrides where > > those > > >>>>>>>>>>> interfaces > > >>>>>>>>>>>>>>>> are > > >>>>>>>>>>>>>>>>>>>>>> used. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Unfortunately, making the key parameter > as > > >>>> "final" > > >>>>>>>>>>>>>>> doesn't > > >>>>>>>>>>>>>>>>>>>> change > > >>>>>>>>>>>>>>>>>>>>>>>>>> much > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> about guarding against key change. It > only > > >>>>>>>>> prevents > > >>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>>>> parameter > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> variable > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from being reassigned. If the key type > is > > a > > >>>>>>>>> mutable > > >>>>>>>>>>>>>>> object > > >>>>>>>>>>>>>>>>>>>> (eg. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> byte[]), > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it can still be mutated. (eg. key[0] = > 0). > > >> But > > >>>>>>>>> I'm > > >>>>>>>>>>> not > > >>>>>>>>>>>>>>>>>> really > > >>>>>>>>>>>>>>>>>>>>>> sure > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> there's > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> much that can be done about that. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Mathieu > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, May 1, 2017 at 5:39 PM, Jeyhun > > >> Karimov > > >>>> < > > >>>>>>>>>>>>>>>>>>>>>>>>> je.kari...@gmail.com > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for comments. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The concerns makes sense. Although we > can > > >>>> guard > > >>>>>>>>> for > > >>>>>>>>>>>>>>>>>> immutable > > >>>>>>>>>>>>>>>>>>>>>> keys > > >>>>>>>>>>>>>>>>>>>>>>>>>> in > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> current implementation (with few > > changes), I > > >>>>>>>>> didn't > > >>>>>>>>>>>>>>>> consider > > >>>>>>>>>>>>>>>>>>>>>>>>>> backward > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compatibility. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In this case 2 solutions come to my > mind. > > In > > >>>> both > > >>>>>>>>>>>>> cases, > > >>>>>>>>>>>>>>>>>> user > > >>>>>>>>>>>>>>>>>>>>>>>>>> accesses > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> key in Object type, as passing extra > type > > >>>>>>>>> parameter > > >>>>>>>>>>>>> will > > >>>>>>>>>>>>>>>>>> break > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> backwards-compatibility. So user has > to > > >> cast > > >>>> to > > >>>>>>>>>>> actual > > >>>>>>>>>>>>>>>> key > > >>>>>>>>>>>>>>>>>>>>>> type. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> > > -- > -Cheers > > Jeyhun > -- -- Guozhang