Re: [DISCUSS]: KIP-149: Enabling key access in ValueTransformer, ValueMapper, and ValueJoiner

Matthias J. Sax Sun, 28 May 2017 09:22:01 -0700

I don't think we can drop ValueTransformerSupplier. If you don't have an
supplier, you only get a single instance of your function. But for a
stateful transformation, we need multiple instances (one for each task)
of ValueTransformer.


We don't need suppliers for functions like "ValueMapper" etc because
those are stateless and thus, we can reuse a single instance over
multiple tasks -- but we can't do this for ValueTransformer (and similar).

Btw: This reminds me about KIP-159: with regard to the RichFunction we
might need a supplier pattern, too. (I'll comment on the other thread, too.)


-Matthias

On 5/28/17 5:45 AM, Jeyhun Karimov wrote:
> Hi,
> 
> I updated KIP.
> Just to avoid misunderstanding, I meant deprecating  ValueTransformerSupplier
> and I am ok with ValueTransformer.
> So instead of using ValueTransformerSupplier can't we directly use
> ValueTransformer
> or ValueTransformerWithKey?
> 
> Btw, in current design all features of ValueTransformer will be available
> in  ValueTransformerWithKey interface.
> 
> Cheers,
> Jeyhun
> 
> On Sun, May 28, 2017 at 6:15 AM Matthias J. Sax <[email protected]>
> wrote:
> 
>> Thanks Jeyhun.
>>
>> About ValueTransformer: I don't think we can remove it. Note,
>> ValueTransformer allows to attach a state and also allows to register
>> punctuations. Both those features will not be available via withKey()
>> interfaces.
>>
>> -Matthias
>>
>>
>> On 5/27/17 1:25 PM, Jeyhun Karimov wrote:
>>> Hi Matthias,
>>>
>>> Thanks for your comments.
>>>
>>> I tested the deep copy approach. It has significant overhead. Especially
>>> for "light" and stateless operators it slows down the topology
>>> significantly (> 20% ). I think "warning"  users about not-changing the
>> key
>>> is better warning them about possible performance loss.
>>>
>>> About the interfaces, additionally I considered adding
>> InitializerWithKey,
>>> AggregatorWithKey and ValueTransformerWithKey. I think they are included
>> in
>>> PR but not in KIP. I will also include them in KIP, sorry my bad.
>>> Including ReducerWithKey definitely makes sense. Thanks.
>>>
>>> One thing I want to mention is that, maybe we should deprecate methods
>> with
>>> argument type ValueTransformerSupplier (KStream.transformValues(...)) and
>>> and as a whole the ValueTransformerSupplier interface.
>>> We can use ValueTransformer/ValueTransformerWithKey type instead without
>>> additional supplier layer.
>>>
>>>
>>> Cheers,
>>> Jeyhun
>>>
>>>
>>> On Thu, May 25, 2017 at 1:07 AM Matthias J. Sax <[email protected]>
>>> wrote:
>>>
>>>> One more question:
>>>>
>>>> Should we add any of
>>>>  - InitizialierWithKey
>>>>  - ReducerWithKey
>>>>  - ValueTransformerWithKey
>>>>
>>>> To get consistent/complete API, it might be a good idea. Any thoughts?
>>>>
>>>>
>>>> -Matthias
>>>>
>>>>
>>>> On 5/24/17 3:47 PM, Matthias J. Sax wrote:
>>>>> Jeyhun,
>>>>>
>>>>> I was just wondering if you did look into the key-deep-copy idea we
>>>>> discussed. I am curious to see what the impact might be.
>>>>>
>>>>>
>>>>> -Matthias
>>>>>
>>>>> On 5/20/17 2:03 AM, Jeyhun Karimov wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Thanks for your comments. I rethink about including rich functions
>> into
>>>>>> this KIP.
>>>>>> I think once we include rich functions in this KIP and then fix
>>>>>> ProcessorContext in another KIP and incorporate with existing rich
>>>>>> functions, the code will not be backwards compatible.
>>>>>>
>>>>>> I see Damian's and your point more clearly now.
>>>>>>
>>>>>> Conclusion: we include only withKey interfaces in this KIP (I updated
>>>> the
>>>>>> KIP), I will try also initiate another KIP for rich functions.
>>>>>>
>>>>>> Cheers,
>>>>>> Jeyhun
>>>>>>
>>>>>> On Fri, May 19, 2017 at 10:50 PM Matthias J. Sax <
>> [email protected]
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>> With the current KIP, using ValueMapper and ValueMapperWithKey
>>>>>>> interfaces, RichFunction seems to be an independent add-on. To fix
>> the
>>>>>>> original issue to allow key access, RichFunctions are not required
>>>> IMHO.
>>>>>>>
>>>>>>> I initially put the RichFunction idea on the table, because I was
>>>> hoping
>>>>>>> to get a uniform API. And I think, is was good to consider them --
>>>>>>> however, the discussion showed that they are not necessary for key
>>>>>>> access. Thus, it seems to be better to handle RichFunctions in an own
>>>>>>> KIP. The ProcessorContext/RecordContext issues seems to be a main
>>>>>>> challenge for this. And introducing RichFunctions with parameter-less
>>>>>>> init() method, seem not to help too much. We would get an
>>>> "intermediate"
>>>>>>> API that we want to change anyway later on...
>>>>>>>
>>>>>>> As you put already much effort into RichFunction, feel free do push
>>>> this
>>>>>>> further and start a new KIP (we can do this even in parallel) -- we
>>>>>>> don't want to slow you down :) But it make the discussion and code
>>>>>>> review easier, if we separate both IMHO.
>>>>>>>
>>>>>>>
>>>>>>> -Matthias
>>>>>>>
>>>>>>>
>>>>>>> On 5/19/17 2:25 AM, Jeyhun Karimov wrote:
>>>>>>>> Hi Damian,
>>>>>>>>
>>>>>>>> Thanks for your comments. I think providing to users *interface*
>>>> rather
>>>>>>>> than *abstract class* should be preferred (Matthias also raised this
>>>>>>> issue
>>>>>>>> ), anyway I changed the corresponding parts of KIP.
>>>>>>>>
>>>>>>>> Regarding with passing additional contextual information, I think it
>>>> is a
>>>>>>>> tradeoff,
>>>>>>>> 1) first, we fix the context parameter for *init() *method in
>> another
>>>> PR
>>>>>>>> and solve Rich functions afterwards
>>>>>>>> 2) first, we fix the requested issues on jira ([1-3]) with providing
>>>>>>>> (not-complete) Rich functions and integrate the context parameters
>> to
>>>>>>> this
>>>>>>>> afterwards (like improvement)
>>>>>>>>
>>>>>>>> To me, the second approach seems more incremental. However you are
>>>> right,
>>>>>>>> the names might confuse the users.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> [1] https://issues.apache.org/jira/browse/KAFKA-4218
>>>>>>>> [2] https://issues.apache.org/jira/browse/KAFKA-4726
>>>>>>>> [3] https://issues.apache.org/jira/browse/KAFKA-3745
>>>>>>>>
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Jeyhun
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, May 19, 2017 at 10:42 AM Damian Guy <[email protected]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I see you've removed the `ProcessorContext` from the RichFunction
>>>> which
>>>>>>> is
>>>>>>>>> good, but why is it a `RichFunction`? I'd have expected it to pass
>>>> some
>>>>>>>>> additional contextual information, i.e., the `RecordContext` that
>>>>>>> contains
>>>>>>>>> just the topic, partition, timestamp, offset.  I'm ok with it not
>>>>>>> passing
>>>>>>>>> this contextual information, but is the naming incorrect? I'm not
>>>> sure,
>>>>>>>>> tbh. I'm wondering if we should drop `RichFunctions` until we can
>> do
>>>> it
>>>>>>>>> properly with the correct context?
>>>>>>>>>
>>>>>>>>> Also, i don't like the abstract classes: RichValueMapper,
>>>>>>> RichValueJoiner,
>>>>>>>>> RichInitializer etc. Why can't they not just be interfaces?
>>>> Generally we
>>>>>>>>> should provide users with Intefaces to code against, not classes.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Damian
>>>>>>>>>
>>>>>>>>> On Fri, 19 May 2017 at 00:50 Jeyhun Karimov <[email protected]>
>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Thanks. I initiated the PR as well, to get a better overview.
>>>>>>>>>>
>>>>>>>>>> The only reason that I used abstract class instead of interface
>> for
>>>>>>> Rich
>>>>>>>>>> functions is that in future if we will have some
>>>> AbstractRichFunction
>>>>>>>>>> abstract classes,
>>>>>>>>>> we can easily extend:
>>>>>>>>>>
>>>>>>>>>> public abstract class RichValueMapper<K, V, VR>  implements
>>>>>>>>>> ValueMapperWithKey<K, V, VR>, RichFunction *extends
>>>>>>>>> AbstractRichFunction*{
>>>>>>>>>> }
>>>>>>>>>>  With interfaces we are only limited to interfaces for
>> inheritance.
>>>>>>>>>>
>>>>>>>>>> However, I think we can easily change it (from abstract class ->
>>>>>>>>> interface)
>>>>>>>>>> if you think interface is a better fit.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Jeyhun
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, May 19, 2017 at 1:00 AM Matthias J. Sax <
>>>> [email protected]
>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks for the update and explanations. The KIP is quite improved
>>>> now
>>>>>>>>> --
>>>>>>>>>>> great job!
>>>>>>>>>>>
>>>>>>>>>>> One more question: Why are RichValueMapper etc abstract classes
>> and
>>>>>>> not
>>>>>>>>>>> interfaces?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Overall, I like the KIP a lot!
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -Matthias
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 5/16/17 7:03 AM, Jeyhun Karimov wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for your comments.
>>>>>>>>>>>>
>>>>>>>>>>>> I think supporting Lambdas for `withKey` and
>>>> `AbstractRichFunction`
>>>>>>>>>>>>> don't go together, as Lambdas are only supported for interfaces
>>>>>>>>> AFAIK.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Maybe I misunderstood your comment.
>>>>>>>>>>>> *withKey* and and *withOnlyValue* are interfaces. So they don't
>>>> have
>>>>>>>>>>> direct
>>>>>>>>>>>> relation with *AbstractRichFunction*.
>>>>>>>>>>>> *withKey* and and *withOnlyValue* interfaces have only one
>>>> method ,
>>>>>>>>> so
>>>>>>>>>>> we
>>>>>>>>>>>> can use lambdas.
>>>>>>>>>>>> Where does the *AbstractRichFunction* comes to play? Inside Rich
>>>>>>>>>>> functions.
>>>>>>>>>>>> And we use Rich functions in 2 places:
>>>>>>>>>>>>
>>>>>>>>>>>> 1. User doesn't use rich functions. Just regular *withKey* and
>> and
>>>>>>>>>>>> *withOnlyValue* interfaces(both support lambdas) . We get those
>>>>>>>>>>> interfaces
>>>>>>>>>>>> and wrap into Rich function while building the topology, and
>> send
>>>> to
>>>>>>>>>>>> Processor.
>>>>>>>>>>>> 2. User does use rich functions (Rich functions implement
>>>> *withKey*
>>>>>>>>>>>> interface). As a result no lamdas here by definition. In this
>>>> case,
>>>>>>>>>> while
>>>>>>>>>>>> building the topology we do a type check if the object type is
>>>>>>>>>>>> *withKey* or *RichFunction*.
>>>>>>>>>>>>
>>>>>>>>>>>> So *AbstractRichFunction* is just syntactic sugar for Rich
>>>> functions
>>>>>>>>>> and
>>>>>>>>>>>> does not affect using lambdas.
>>>>>>>>>>>>
>>>>>>>>>>>> Thus, if we want to support Lambdas for `withKey`, we need to
>>>> have a
>>>>>>>>>>>>> interface approach like this
>>>>>>>>>>>>>   - RichFunction -> only adding init() and close()
>>>>>>>>>>>>>   - ValueMapper
>>>>>>>>>>>>>   - ValueMapperWithKey
>>>>>>>>>>>>>   - RichValueMapper extends ValueMapperWithKey, RichFunction
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> As I said above, currently we support lambdas for *withKey*
>>>>>>>>> interfaces
>>>>>>>>>> as
>>>>>>>>>>>> well.  However, I agree with your idea and I will remove the
>>>>>>>>>>>> AbstractRichFunction from the design.
>>>>>>>>>>>>
>>>>>>>>>>>> As an alternative, we could argue, that it is sufficient to
>>>> support
>>>>>>>>>>>>> Lambdas for the "plain" API only, but not for any "extended
>> API".
>>>>>>>>> For
>>>>>>>>>>>>> this, RichFunction could add key+init+close and
>>>> AbstractRichFunction
>>>>>>>>>>>>> would allow to only care about getting the key.
>>>>>>>>>>>>> Not sure, which one is better. I don't like the idea of more
>>>>>>>>>> overloaded
>>>>>>>>>>>>> methods to get Lambdas for `withKey` interfaces too much
>> because
>>>> we
>>>>>>>>>> have
>>>>>>>>>>>>> already so many overlaods. On the other hand, I do see value in
>>>>>>>>>>>>> supporting Lambdas for `withKey`.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Just to clarify, with current design we have only one extra
>>>>>>>>> overloaded
>>>>>>>>>>>> method per *withOnlyValue* interface:  which is *withKey*
>> version
>>>> of
>>>>>>>>>>>> particular interface.
>>>>>>>>>>>> We don't need extra overload for Rich function as Rich function
>>>>>>>>>>> implements
>>>>>>>>>>>> *withKey* interface as a result they have same type. We
>>>> differentiate
>>>>>>>>>>> them
>>>>>>>>>>>> while building the topology.
>>>>>>>>>>>> We supported lambdas for *withKey* APIs because of the comment:
>>>>>>>>>>>>
>>>>>>>>>>>> @Jeyhun: I did not put any thought into this, but can we have a
>>>>>>>>> design
>>>>>>>>>>>>> that allows for both? Also, with regard to lambdas, it might
>> make
>>>>>>>>>> sense
>>>>>>>>>>>>> to allow for both `V -> newV` and `(K, V) -> newV` ?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> However, I don't think that this complicates the overall design
>>>>>>>>>>>> significantly.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Depending on what we want to support, it might make sense to
>>>>>>>>>>>>> include/exclude RichFunctions from this KIP -- and thus, this
>>>> also
>>>>>>>>>>>>> determines if we should have a "ProcessorContext KIP" before
>>>> driving
>>>>>>>>>>>>> this KIP further.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Based on our discussion I think we should keep Rich functions
>> as I
>>>>>>>>>> don't
>>>>>>>>>>>> think that they bring extra layer of overhead to library.
>>>>>>>>>>>>
>>>>>>>>>>>> Any comments are appreciated.
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Jeyhun
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, May 16, 2017 at 12:10 AM Matthias J. Sax <
>>>>>>>>>> [email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Jeyhun,
>>>>>>>>>>>>>
>>>>>>>>>>>>> thanks for the update.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think supporting Lambdas for `withKey` and
>>>> `AbstractRichFunction`
>>>>>>>>>>>>> don't go together, as Lambdas are only supported for interfaces
>>>>>>>>> AFAIK.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thus, if we want to support Lambdas for `withKey`, we need to
>>>> have a
>>>>>>>>>>>>> interface approach like this
>>>>>>>>>>>>>
>>>>>>>>>>>>>   - RichFunction -> only adding init() and close()
>>>>>>>>>>>>>
>>>>>>>>>>>>>   - ValueMapper
>>>>>>>>>>>>>   - ValueMapperWithKey
>>>>>>>>>>>>>
>>>>>>>>>>>>>   - RichValueMapper extends ValueMapperWithKey, RichFunction
>>>>>>>>>>>>>
>>>>>>>>>>>>> For this approach, AbstractRichFunction does not make sense
>>>> anymore,
>>>>>>>>>> as
>>>>>>>>>>>>> the only purpose of `RichFunction` is to allow the
>>>> implementation of
>>>>>>>>>>>>> init() and close() -- if you don't want those, you would
>>>> implement a
>>>>>>>>>>>>> different interface (ie, ValueMapperWithKey)
>>>>>>>>>>>>>
>>>>>>>>>>>>> As an alternative, we could argue, that it is sufficient to
>>>> support
>>>>>>>>>>>>> Lambdas for the "plain" API only, but not for any "extended
>> API".
>>>>>>>>> For
>>>>>>>>>>>>> this, RichFunction could add key+init+close and
>>>> AbstractRichFunction
>>>>>>>>>>>>> would allow to only care about getting the key.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Not sure, which one is better. I don't like the idea of more
>>>>>>>>>> overloaded
>>>>>>>>>>>>> methods to get Lambdas for `withKey` interfaces too much
>> because
>>>> we
>>>>>>>>>> have
>>>>>>>>>>>>> already so many overlaods. On the other hand, I do see value in
>>>>>>>>>>>>> supporting Lambdas for `withKey`.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Depending on what we want to support, it might make sense to
>>>>>>>>>>>>> include/exclude RichFunctions from this KIP -- and thus, this
>>>> also
>>>>>>>>>>>>> determines if we should have a "ProcessorContext KIP" before
>>>> driving
>>>>>>>>>>>>> this KIP further.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thoughts?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 5/15/17 11:01 AM, Jeyhun Karimov wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sorry for super late response. Thanks for your comments.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am not an expert on Lambdas. Can you elaborate a little
>> bit? I
>>>>>>>>>> cannot
>>>>>>>>>>>>>>> follow the explanation in the KIP to see what the problem is.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - From [1] says "A functional interface is an interface that
>> has
>>>>>>>>> just
>>>>>>>>>>> one
>>>>>>>>>>>>>> abstract method, and thus represents a single function
>>>> contract".
>>>>>>>>>>>>>> So basically once we extend some interface from another (in
>> our
>>>>>>>>> case,
>>>>>>>>>>>>>> ValueMapperWithKey from ValueMapper) we cannot use lambdas in
>>>> the
>>>>>>>>>>>>> extended
>>>>>>>>>>>>>> interface.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Further comments:
>>>>>>>>>>>>>>>  - The KIP get a little hard to read -- can you maybe
>> reformat
>>>> the
>>>>>>>>>>> wiki
>>>>>>>>>>>>>>> page a little bit? I think using `CodeBlock` would help.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - I will work on the KIP.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  - What about KStream-KTable joins? You don't have overlaods
>>>> added
>>>>>>>>>> for
>>>>>>>>>>>>>>> them. Why? (Even if I still hope that we don't need to add
>> any
>>>> new
>>>>>>>>>>>>>>> overloads)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - Actually there are more than one Processor and public APIs
>> to
>>>> be
>>>>>>>>>>>>>> changed (KStream-KTable
>>>>>>>>>>>>>> joins is one case). However all of them has similar structure:
>>>> we
>>>>>>>>>>>>> overload
>>>>>>>>>>>>>> the *method* with  *methodWithKey*,
>>>>>>>>>>>>>> wrap it into the Rich function, send to processor and inside
>> the
>>>>>>>>>>>>> processor
>>>>>>>>>>>>>> call *init* and *close* methods of the Rich function.
>>>>>>>>>>>>>> As I wrote in KIP, I wanted to demonstrate the overall idea
>> with
>>>>>>>>> only
>>>>>>>>>>>>>> *ValueMapper* as the same can be applied to all changes.
>>>>>>>>>>>>>> Anyway I will update the KIP.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  - Why do we need `AbstractRichFunction`?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Instead of overriding the *init(ProcessorContext p)* and*
>>>> close()*
>>>>>>>>>>>>> methods
>>>>>>>>>>>>>> in every Rich function with empty body like:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> @Override
>>>>>>>>>>>>>> void init(ProcessorContext context) {}
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> @Override
>>>>>>>>>>>>>> void close () {}
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I thought that we can override them once in
>>>> *AbstractRichFunction*
>>>>>>>>>> and
>>>>>>>>>>>>>> extent new Rich functions from *AbstractRichFunction*.
>>>>>>>>>>>>>> Basically this can eliminate code copy-paste and ease the
>>>>>>>>>> maintenance.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  - What about interfaces Initializer, ForeachAction, Merger,
>>>>>>>>>> Predicate,
>>>>>>>>>>>>>>> Reducer? I don't want to say we should/need to add to all,
>> but
>>>> we
>>>>>>>>>>> should
>>>>>>>>>>>>>>> discuss all of them and add where it does make sense (e.g.,
>>>>>>>>>>>>>>> RichForachAction does make sense IMHO)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Definitely agree. As I said, the same technique applies to all
>>>> this
>>>>>>>>>>>>>> interfaces and I didn't want to explode the KIP, just wanted
>> to
>>>>>>>>> give
>>>>>>>>>>> the
>>>>>>>>>>>>>> overall intuition.
>>>>>>>>>>>>>> However, I will update the KIP as I said.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Btw: I like the hierarchy `ValueXX` -- `ValueXXWithKey` --
>>>>>>>>>>> `RichValueXX`
>>>>>>>>>>>>>>> in general -- but why can't we do all this with interfaces
>>>> only?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sure we can. However the main intuition is we should not force
>>>>>>>>> users
>>>>>>>>>> to
>>>>>>>>>>>>>> implement *init(ProcessorContext)* and *close()* functions
>> every
>>>>>>>>> time
>>>>>>>>>>>>> they
>>>>>>>>>>>>>> use Rich functions.
>>>>>>>>>>>>>> If one needs, she can override the respective methods.
>> However,
>>>> I
>>>>>>>>> am
>>>>>>>>>>> open
>>>>>>>>>>>>>> for discussion.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'd rather not see the use of  `ProcessorContext` spread any
>>>>>>>>> further
>>>>>>>>>>> than
>>>>>>>>>>>>>>> it currently is. So maybe we need another KIP that is done
>>>> before
>>>>>>>>>>> this?
>>>>>>>>>>>>>>> Otherwise i think the scope of this KIP is becoming too
>> large.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> That is good point. I wanted to make *init(ProcessorContext)*
>>>>>>>>> method
>>>>>>>>>>>>>> persistent among the library (which use ProcessorContext as an
>>>>>>>>>> input),
>>>>>>>>>>>>>> therefore I put *ProcessorContext* as an input.
>>>>>>>>>>>>>> So the important question is that (as @dguy and @mjsax
>>>> mentioned)
>>>>>>>>>>> whether
>>>>>>>>>>>>>> continue this KIP without providing users an access to
>>>>>>>>>>> *ProcessorContext*
>>>>>>>>>>>>>> (change *init (ProcessorContext)* to * init()* ) or
>>>>>>>>>>>>>> initiate another KIP before this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~mr/se/8/java-se-8-pfd-spec/java-
>>>>>>>>>>>>> se-8-jls-pfd-diffs.pdf
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>> Jeyhun
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, May 15, 2017 at 7:15 PM, Damian Guy <
>>>> [email protected]>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'd rather not see the use of  `ProcessorContext` spread any
>>>>>>>>> further
>>>>>>>>>>>>> than
>>>>>>>>>>>>>>> it currently is. So maybe we need another KIP that is done
>>>> before
>>>>>>>>>>> this?
>>>>>>>>>>>>>>> Otherwise i think the scope of this KIP is becoming too
>> large.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, 15 May 2017 at 18:06 Matthias J. Sax <
>>>>>>>>> [email protected]
>>>>>>>>>>>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I agree that that `ProcessorContext` interface is too broad
>> in
>>>>>>>>>>> general
>>>>>>>>>>>>>>>> -- this is even true for transform/process, and it's also
>>>>>>>>> reflected
>>>>>>>>>>> in
>>>>>>>>>>>>>>>> the API improvement list we want to do.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/
>>>>>>>>>>>>>>> Kafka+Streams+Discussions
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So I am wondering, if you question the `RichFunction`
>>>> approach in
>>>>>>>>>>>>>>>> general? Or if you suggest to either extend the scope of
>> this
>>>> KIP
>>>>>>>>>> to
>>>>>>>>>>>>>>>> include this---or maybe better, do another KIP for it and
>>>> delay
>>>>>>>>>> this
>>>>>>>>>>>>> KIP
>>>>>>>>>>>>>>>> until the other one is done?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 5/15/17 2:35 AM, Damian Guy wrote:
>>>>>>>>>>>>>>>>> Thanks for the KIP.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'm not convinced on the `RichFunction` approach. Do we
>>>> really
>>>>>>>>>> want
>>>>>>>>>>> to
>>>>>>>>>>>>>>>> give
>>>>>>>>>>>>>>>>> every DSL method access to the `ProcessorContext` ? It has
>> a
>>>>>>>>> bunch
>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> methods on it that seem in-appropriate for some of the DSL
>>>>>>>>>> methods,
>>>>>>>>>>>>>>> i.e,
>>>>>>>>>>>>>>>>> `register`, `getStateStore`, `forward`, `schedule` etc. It
>> is
>>>>>>>>> far
>>>>>>>>>>> too
>>>>>>>>>>>>>>>>> broad. I think it would be better to have a narrower
>>>> interface
>>>>>>>>>> like
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> `RecordContext`  - remembering it is easier to add
>>>>>>>>>>> methods/interfaces
>>>>>>>>>>>>>>>> later
>>>>>>>>>>>>>>>>> than to remove them
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sat, 13 May 2017 at 22:26 Matthias J. Sax <
>>>>>>>>>> [email protected]
>>>>>>>>>>>>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Jeyhun,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I am not an expert on Lambdas. Can you elaborate a little
>>>> bit?
>>>>>>>>> I
>>>>>>>>>>>>>>> cannot
>>>>>>>>>>>>>>>>>> follow the explanation in the KIP to see what the problem
>>>> is.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> For updating the KIP title I don't know -- guess it's up
>> to
>>>>>>>>> you.
>>>>>>>>>>>>>>> Maybe a
>>>>>>>>>>>>>>>>>> committer can comment on this?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Further comments:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  - The KIP get a little hard to read -- can you maybe
>>>> reformat
>>>>>>>>>> the
>>>>>>>>>>>>>>> wiki
>>>>>>>>>>>>>>>>>> page a little bit? I think using `CodeBlock` would help.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  - What about KStream-KTable joins? You don't have
>> overlaods
>>>>>>>>>> added
>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>> them. Why? (Even if I still hope that we don't need to add
>>>> any
>>>>>>>>>> new
>>>>>>>>>>>>>>>>>> overloads)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  - Why do we need `AbstractRichFunction`?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  - What about interfaces Initializer, ForeachAction,
>> Merger,
>>>>>>>>>>>>>>> Predicate,
>>>>>>>>>>>>>>>>>> Reducer? I don't want to say we should/need to add to all,
>>>> but
>>>>>>>>> we
>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>> discuss all of them and add where it does make sense
>> (e.g.,
>>>>>>>>>>>>>>>>>> RichForachAction does make sense IMHO)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Btw: I like the hierarchy `ValueXX` -- `ValueXXWithKey` --
>>>>>>>>>>>>>>> `RichValueXX`
>>>>>>>>>>>>>>>>>> in general -- but why can't we do all this with interfaces
>>>>>>>>> only?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 5/11/17 7:00 AM, Jeyhun Karimov wrote:
>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks for your comments. I think we cannot extend the
>> two
>>>>>>>>>>>>> interfaces
>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>> want to keep lambdas. I updated the KIP [1]. Maybe I
>> should
>>>>>>>>>> change
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> title, because now we are not limiting the KIP to only
>>>>>>>>>>> ValueMapper,
>>>>>>>>>>>>>>>>>>> ValueTransformer and ValueJoiner.
>>>>>>>>>>>>>>>>>>> Please feel free to comment.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>>>>>> 149%3A+Enabling+key+access+in+ValueTransformer%2C+
>>>>>>>>>>>>>>> ValueMapper%2C+and+ValueJoiner
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>> Jeyhun
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Tue, May 9, 2017 at 7:36 PM Matthias J. Sax <
>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> If `ValueMapperWithKey` extends `ValueMapper` we don't
>>>> need
>>>>>>>>> the
>>>>>>>>>>> new
>>>>>>>>>>>>>>>>>>>> overlaod.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> And yes, we need to do one check -- but this happens
>> when
>>>>>>>>>>> building
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> topology. At runtime (I mean after KafkaStream#start()
>> we
>>>>>>>>> don't
>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>> any
>>>>>>>>>>>>>>>>>>>> check as we will always use `ValueMapperWithKey`)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On 5/9/17 2:55 AM, Jeyhun Karimov wrote:
>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks for feedback.
>>>>>>>>>>>>>>>>>>>>> Then we need to overload method
>>>>>>>>>>>>>>>>>>>>>   <VR> KStream<K, VR> mapValues(ValueMapper<? super V,
>> ?
>>>>>>>>>> extends
>>>>>>>>>>>>>>> VR>
>>>>>>>>>>>>>>>>>>>>> mapper);
>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>   <VR> KStream<K, VR> mapValues(ValueMapperWithKey<?
>>>> super
>>>>>>>>> V,
>>>>>>>>>> ?
>>>>>>>>>>>>>>>> extends
>>>>>>>>>>>>>>>>>>>> VR>
>>>>>>>>>>>>>>>>>>>>> mapper);
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> and in runtime (inside processor) we still have to
>> check
>>>> it
>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>> ValueMapper
>>>>>>>>>>>>>>>>>>>>> or ValueMapperWithKey before wrapping it into the rich
>>>>>>>>>> function.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Please correct me if I am wrong.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>> Jeyhun
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Tue, May 9, 2017 at 10:56 AM Michal Borowiecki <
>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> +1 :)
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On 08/05/17 23:52, Matthias J. Sax wrote:
>>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I was reading the updated KIP and I am wondering, if
>> we
>>>>>>>>>> should
>>>>>>>>>>>>> do
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> design a little different.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Instead of distinguishing between a RichFunction and
>>>>>>>>>>>>>>>> non-RichFunction
>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>> runtime level, we would use RichFunctions all the
>> time.
>>>>>>>>>> Thus,
>>>>>>>>>>> on
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> DSL
>>>>>>>>>>>>>>>>>>>>>>> entry level, if a user provides a non-RichFunction,
>> we
>>>>>>>>> wrap
>>>>>>>>>> it
>>>>>>>>>>>>>>> by a
>>>>>>>>>>>>>>>>>>>>>>> RichFunction that is fully implemented by Streams.
>> This
>>>>>>>>>>>>>>>> RichFunction
>>>>>>>>>>>>>>>>>>>>>>> would just forward the call omitting the key:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Just to sketch the idea (incomplete code snippet):
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> public StreamsRichValueMapper implements
>>>>>>>>> RichValueMapper()
>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>>>    private ValueMapper userProvidedMapper; // set by
>>>>>>>>>>>>> constructor
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>    public VR apply(final K key, final V1 value1,
>>>> final V2
>>>>>>>>>>>>>>> value2)
>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>>>        return userProvidedMapper(value1, value2);
>>>>>>>>>>>>>>>>>>>>>>>>    }
>>>>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>  From a performance point of view, I am not sure if
>> the
>>>>>>>>>>>>>>>>>>>>>>> "if(isRichFunction)" including casts etc would have
>>>> more
>>>>>>>>>>>>> overhead
>>>>>>>>>>>>>>>>>> than
>>>>>>>>>>>>>>>>>>>>>>> this approach (we would do more nested method call
>> for
>>>>>>>>>>>>>>>>>> non-RichFunction
>>>>>>>>>>>>>>>>>>>>>>> which should be more common than RichFunctions).
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> This approach should not effect lambdas (or do I miss
>>>>>>>>>>>>> something?)
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>> might be cleaner, as we could have one more top level
>>>>>>>>>>> interface
>>>>>>>>>>>>>>>>>>>>>>> `RichFunction` with methods `init()` and `close()`
>> and
>>>>>>>>> also
>>>>>>>>>>>>>>>>>> interfaces
>>>>>>>>>>>>>>>>>>>>>>> for `RichValueMapper` etc. (thus, no abstract classes
>>>>>>>>>>> required).
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Any thoughts?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On 5/6/17 5:29 PM, Jeyhun Karimov wrote:
>>>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks for comments. I extended PR and KIP to
>> include
>>>>>>>>> rich
>>>>>>>>>>>>>>>>>> functions.
>>>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>>>>>> will still have to evaluate the cost of deep copying
>>>> of
>>>>>>>>>> keys.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>>> Jeyhun
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Fri, May 5, 2017 at 8:02 PM Mathieu Fenniak <
>>>>>>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Hey Matthias,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> My opinion would be that documenting the
>>>> immutability of
>>>>>>>>>> the
>>>>>>>>>>>>>>> key
>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>> best approach available.  Better than requiring the
>>>> key
>>>>>>>>> to
>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>> serializable
>>>>>>>>>>>>>>>>>>>>>>>>> (as with Jeyhun's last pass at the PR), no
>>>> performance
>>>>>>>>>> risk.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> It'd be different if Java had immutable type
>>>> constraints
>>>>>>>>>> of
>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>>> kind.
>>>>>>>>>>>>>>>>>>>>>> :-)
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Mathieu
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, May 5, 2017 at 11:31 AM, Matthias J. Sax <
>>>>>>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Agreed about RichFunction. If we follow this path,
>>>> it
>>>>>>>>>>> should
>>>>>>>>>>>>>>>> cover
>>>>>>>>>>>>>>>>>>>>>>>>>> all(?) DSL interfaces.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> About guarding the key -- I am still not sure what
>>>> to
>>>>>>>>> do
>>>>>>>>>>>>> about
>>>>>>>>>>>>>>>>>> it...
>>>>>>>>>>>>>>>>>>>>>>>>>> Maybe it might be enough to document it (and name
>>>> the
>>>>>>>>> key
>>>>>>>>>>>>>>>>>> parameter
>>>>>>>>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>>>>>>>>>>>> `readOnlyKey` to make is very clear). Ultimately,
>> I
>>>>>>>>> would
>>>>>>>>>>>>>>> prefer
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>> guard against any modification, but I have no good
>>>> idea
>>>>>>>>>> how
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> do
>>>>>>>>>>>>>>>>>>>>>> this.
>>>>>>>>>>>>>>>>>>>>>>>>>> Not sure what others think about the risk of
>>>> corrupted
>>>>>>>>>>>>>>>>>> partitioning
>>>>>>>>>>>>>>>>>>>>>>>>>> (what would be a user error and we could say,
>> well,
>>>> bad
>>>>>>>>>>> luck,
>>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>>> got
>>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>>>> bug in your code, that's not our fault), vs deep
>>>> copy
>>>>>>>>>> with
>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>> potential
>>>>>>>>>>>>>>>>>>>>>>>>>> performance hit (that we can't quantity atm
>> without
>>>> any
>>>>>>>>>>>>>>>>>> performance
>>>>>>>>>>>>>>>>>>>>>>>>> test).
>>>>>>>>>>>>>>>>>>>>>>>>>> We do have a performance system test. Maybe it's
>>>> worth
>>>>>>>>>> for
>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>> apply
>>>>>>>>>>>>>>>>>>>>>>>>>> the deep copy strategy and run the test. It's very
>>>>>>>>> basic
>>>>>>>>>>>>>>>>>> performance
>>>>>>>>>>>>>>>>>>>>>>>>>> test only, but might give some insight. If you
>> want
>>>> to
>>>>>>>>> do
>>>>>>>>>>>>>>> this,
>>>>>>>>>>>>>>>>>> look
>>>>>>>>>>>>>>>>>>>>>>>>>> into folder "tests" for general test setup, and
>> into
>>>>>>>>>>>>>>>>>>>>>>>>>> "tests/kafaktests/benchmarks/streams" to find find
>>>> the
>>>>>>>>>> perf
>>>>>>>>>>>>>>>> test.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On 5/5/17 8:55 AM, Jeyhun Karimov wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Matthias,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> I think extending KIP to include RichFunctions
>>>> totally
>>>>>>>>>>>>> makes
>>>>>>>>>>>>>>>>>>>> sense.
>>>>>>>>>>>>>>>>>>>>>>>>> So,
>>>>>>>>>>>>>>>>>>>>>>>>>>>   we don't want to guard the keys because it is
>>>>>>>>> costly.
>>>>>>>>>>>>>>>>>>>>>>>>>>> If we introduce RichFunctions I think it should
>>>> not be
>>>>>>>>>>>>>>> limited
>>>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>>>>>>>>>> the 3
>>>>>>>>>>>>>>>>>>>>>>>>>>> interfaces proposed in KIP but to wide range of
>>>>>>>>>>> interfaces.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Please correct me if I am wrong.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeyhun
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, May 5, 2017 at 12:04 AM Matthias J. Sax <
>>>>>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> One follow up. There was this email on the user
>>>> list:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>> http://search-hadoop.com/m/Kafka/uyzND17KhCaBzPSZ1?subj=
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>> Shouldn+t+the+initializer+of+a+stream+aggregate+accept+the+
>>>>>>>>>>>>>>> key+
>>>>>>>>>>>>>>>>>>>>>>>>>>>> It might make sense so include Initializer,
>> Adder,
>>>>>>>>> and
>>>>>>>>>>>>>>>>>> Substractor
>>>>>>>>>>>>>>>>>>>>>>>>>>>> inferface, too.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> And we should double check if there are other
>>>>>>>>> interface
>>>>>>>>>>> we
>>>>>>>>>>>>>>>> might
>>>>>>>>>>>>>>>>>>>>>> miss
>>>>>>>>>>>>>>>>>>>>>>>>>> atm.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 5/4/17 1:31 PM, Matthias J. Sax wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for updating the KIP.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Deep copying the key will work for sure, but I
>> am
>>>>>>>>>>> actually
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>> little
>>>>>>>>>>>>>>>>>>>>>>>>> bit
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> worried about performance impact... We might
>>>> want to
>>>>>>>>>> do
>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>> test
>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quantify this impact.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Btw: this remind me about the idea of
>>>> `RichFunction`
>>>>>>>>>>>>>>>> interface
>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would allow users to access record metadata
>> (like
>>>>>>>>>>>>>>> timestamp,
>>>>>>>>>>>>>>>>>>>>>> offset,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> partition etc) within DSL. This would be a
>>>> similar
>>>>>>>>>>>>> concept.
>>>>>>>>>>>>>>>>>>>> Thus, I
>>>>>>>>>>>>>>>>>>>>>>>>> am
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wondering, if it would make sense to enlarge
>> the
>>>>>>>>> scope
>>>>>>>>>>> of
>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>> KIP
>>>>>>>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that? WDYT?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 5/3/17 2:08 AM, Jeyhun Karimov wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Mathieu,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for feedback. I followed similar
>> approach
>>>>>>>>> and
>>>>>>>>>>>>>>> updated
>>>>>>>>>>>>>>>>>> PR
>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>> KIP
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> accordingly. I tried to guard the key in
>>>> Processors
>>>>>>>>>>>>>>> sending
>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>> copy
>>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> actual key.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Because I am doing deep copy of an object, I
>>>> think
>>>>>>>>>>> memory
>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>> bottleneck
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in some use-cases.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeyhun
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, May 2, 2017 at 5:10 PM Mathieu
>> Fenniak <
>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jeyhun,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This approach would change ValueMapper
>>>> (...etc) to
>>>>>>>>>> be
>>>>>>>>>>>>>>>>>> classes,
>>>>>>>>>>>>>>>>>>>>>>>>> rather
>>>>>>>>>>>>>>>>>>>>>>>>>>>> than
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interfaces, which is also a backwards
>>>> incompatible
>>>>>>>>>>>>>>> change.
>>>>>>>>>>>>>>>>>> An
>>>>>>>>>>>>>>>>>>>>>>>>>>>> alternative
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> approach that would be backwards compatible
>>>> would
>>>>>>>>> be
>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> define
>>>>>>>>>>>>>>>>>>>>>> new
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interfaces, and provide overrides where those
>>>>>>>>>>> interfaces
>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>>>> used.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Unfortunately, making the key parameter as
>>>> "final"
>>>>>>>>>>>>>>> doesn't
>>>>>>>>>>>>>>>>>>>> change
>>>>>>>>>>>>>>>>>>>>>>>>>> much
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> about guarding against key change.  It only
>>>>>>>>> prevents
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> parameter
>>>>>>>>>>>>>>>>>>>>>>>>>>>> variable
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from being reassigned.  If the key type is a
>>>>>>>>> mutable
>>>>>>>>>>>>>>> object
>>>>>>>>>>>>>>>>>>>> (eg.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> byte[]),
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it can still be mutated. (eg. key[0] = 0).
>> But
>>>>>>>>> I'm
>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>> really
>>>>>>>>>>>>>>>>>>>>>> sure
>>>>>>>>>>>>>>>>>>>>>>>>>>>> there's
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> much that can be done about that.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Mathieu
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, May 1, 2017 at 5:39 PM, Jeyhun
>> Karimov
>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for comments.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The concerns makes sense. Although we can
>>>> guard
>>>>>>>>> for
>>>>>>>>>>>>>>>>>> immutable
>>>>>>>>>>>>>>>>>>>>>> keys
>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> current implementation (with few changes), I
>>>>>>>>> didn't
>>>>>>>>>>>>>>>> consider
>>>>>>>>>>>>>>>>>>>>>>>>>> backward
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compatibility.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In this case 2 solutions come to my mind. In
>>>> both
>>>>>>>>>>>>> cases,
>>>>>>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>>>>>>>>>>> accesses
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> key in Object type, as passing extra type
>>>>>>>>> parameter
>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>> break
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> backwards-compatibility.  So user has to
>> cast
>>>> to
>>>>>>>>>>> actual
>>>>>>>>>>>>>>>> key
>>>>>>>>>>>>>>>>>>>>>> type.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. Firstly, We can overload apply method
>> with
>>>> 2
>>>>>>>>>>>>> argument
>>>>>>>>>>>>>>>>>> (key
>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>> value)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and force key to be *final*. By doing
>> this,  I
>>>>>>>>>> think
>>>>>>>>>>> we
>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>>>>> address
>>>>>>>>>>>>>>>>>>>>>>>>>>>> both
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> backward-compatibility and guarding against
>>>> key
>>>>>>>>>>> change.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Secondly, we can create class KeyAccess
>>>> like:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> public class KeyAccess {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>      Object key;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>      public void beforeApply(final Object
>>>> key) {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>          this.key = key;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>      public Object getKey() {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>          return key;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We can extend *ValueMapper, ValueJoiner* and
>>>>>>>>>>>>>>>>>>>> *ValueTransformer*
>>>>>>>>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *KeyAccess*. Inside processor (for example
>>>>>>>>>>>>>>>>>>>>>>>>>> *KTableMapValuesProcessor*)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> before calling *mapper.apply(value)* we can
>>>> set
>>>>>>>>> the
>>>>>>>>>>>>>>> *key*
>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *mapper.beforeApply(key)*. As a result, user
>>>> can
>>>>>>>>>> use
>>>>>>>>>>>>>>>>>>>> *getKey()*
>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>> access
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the key inside *apply(value)* method.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeyhun
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, May 1, 2017 at 7:24 PM Matthias J.
>>>> Sax <
>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeyhun,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thanks a lot for the KIP!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think there are two issues we need to
>>>> address:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (1) The KIP does not consider backward
>>>>>>>>>>> compatibility.
>>>>>>>>>>>>>>>> Users
>>>>>>>>>>>>>>>>>>>> did
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> complain
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> about this in past releases already, and as
>>>> the
>>>>>>>>>> user
>>>>>>>>>>>>>>> base
>>>>>>>>>>>>>>>>>>>>>> grows,
>>>>>>>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should not break backward compatibility in
>>>>>>>>> future
>>>>>>>>>>>>>>>> releases
>>>>>>>>>>>>>>>>>>>>>>>>> anymore.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thus, we should think of a better way to
>>>> allow
>>>>>>>>> key
>>>>>>>>>>>>>>>> access.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Mathieu's comment goes into the same
>>>> direction
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On the other hand, the number of compile
>>>>>>>>>> failures
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fixed from this change is unfortunate.
>> :-)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Another concern is, that there is no
>>>> guard
>>>>>>>>> to
>>>>>>>>>>>>>>> prevent
>>>>>>>>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> modify the key. This might corrupt
>>>> partitioning
>>>>>>>>> if
>>>>>>>>>>>>>>> users
>>>>>>>>>>>>>>>> do
>>>>>>>>>>>>>>>>>>>>>> alter
>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <
>

signature.asc
Description: OpenPGP digital signature

Re: [DISCUSS]: KIP-149: Enabling key access in ValueTransformer, ValueMapper, and ValueJoiner

Reply via email to