Re: [DISCUSS]: KIP-149: Enabling key access in ValueTransformer, ValueMapper, and ValueJoiner

Jeyhun Karimov Sat, 27 May 2017 13:26:38 -0700

Hi Matthias,

Thanks for your comments.


I tested the deep copy approach. It has significant overhead. Especially
for "light" and stateless operators it slows down the topology
significantly (> 20% ). I think "warning"  users about not-changing the key
is better warning them about possible performance loss.

About the interfaces, additionally I considered adding InitializerWithKey,
AggregatorWithKey and ValueTransformerWithKey. I think they are included in
PR but not in KIP. I will also include them in KIP, sorry my bad.
Including ReducerWithKey definitely makes sense. Thanks.

One thing I want to mention is that, maybe we should deprecate methods with
argument type ValueTransformerSupplier (KStream.transformValues(...)) and
and as a whole the ValueTransformerSupplier interface.
We can use ValueTransformer/ValueTransformerWithKey type instead without
additional supplier layer.


Cheers,
Jeyhun


On Thu, May 25, 2017 at 1:07 AM Matthias J. Sax <matth...@confluent.io>
wrote:

> One more question:
>
> Should we add any of
>  - InitizialierWithKey
>  - ReducerWithKey
>  - ValueTransformerWithKey
>
> To get consistent/complete API, it might be a good idea. Any thoughts?
>
>
> -Matthias
>
>
> On 5/24/17 3:47 PM, Matthias J. Sax wrote:
> > Jeyhun,
> >
> > I was just wondering if you did look into the key-deep-copy idea we
> > discussed. I am curious to see what the impact might be.
> >
> >
> > -Matthias
> >
> > On 5/20/17 2:03 AM, Jeyhun Karimov wrote:
> >> Hi,
> >>
> >> Thanks for your comments. I rethink about including rich functions into
> >> this KIP.
> >> I think once we include rich functions in this KIP and then fix
> >> ProcessorContext in another KIP and incorporate with existing rich
> >> functions, the code will not be backwards compatible.
> >>
> >> I see Damian's and your point more clearly now.
> >>
> >> Conclusion: we include only withKey interfaces in this KIP (I updated
> the
> >> KIP), I will try also initiate another KIP for rich functions.
> >>
> >> Cheers,
> >> Jeyhun
> >>
> >> On Fri, May 19, 2017 at 10:50 PM Matthias J. Sax <matth...@confluent.io
> >
> >> wrote:
> >>
> >>> With the current KIP, using ValueMapper and ValueMapperWithKey
> >>> interfaces, RichFunction seems to be an independent add-on. To fix the
> >>> original issue to allow key access, RichFunctions are not required
> IMHO.
> >>>
> >>> I initially put the RichFunction idea on the table, because I was
> hoping
> >>> to get a uniform API. And I think, is was good to consider them --
> >>> however, the discussion showed that they are not necessary for key
> >>> access. Thus, it seems to be better to handle RichFunctions in an own
> >>> KIP. The ProcessorContext/RecordContext issues seems to be a main
> >>> challenge for this. And introducing RichFunctions with parameter-less
> >>> init() method, seem not to help too much. We would get an
> "intermediate"
> >>> API that we want to change anyway later on...
> >>>
> >>> As you put already much effort into RichFunction, feel free do push
> this
> >>> further and start a new KIP (we can do this even in parallel) -- we
> >>> don't want to slow you down :) But it make the discussion and code
> >>> review easier, if we separate both IMHO.
> >>>
> >>>
> >>> -Matthias
> >>>
> >>>
> >>> On 5/19/17 2:25 AM, Jeyhun Karimov wrote:
> >>>> Hi Damian,
> >>>>
> >>>> Thanks for your comments. I think providing to users *interface*
> rather
> >>>> than *abstract class* should be preferred (Matthias also raised this
> >>> issue
> >>>> ), anyway I changed the corresponding parts of KIP.
> >>>>
> >>>> Regarding with passing additional contextual information, I think it
> is a
> >>>> tradeoff,
> >>>> 1) first, we fix the context parameter for *init() *method in another
> PR
> >>>> and solve Rich functions afterwards
> >>>> 2) first, we fix the requested issues on jira ([1-3]) with providing
> >>>> (not-complete) Rich functions and integrate the context parameters to
> >>> this
> >>>> afterwards (like improvement)
> >>>>
> >>>> To me, the second approach seems more incremental. However you are
> right,
> >>>> the names might confuse the users.
> >>>>
> >>>>
> >>>>
> >>>> [1] https://issues.apache.org/jira/browse/KAFKA-4218
> >>>> [2] https://issues.apache.org/jira/browse/KAFKA-4726
> >>>> [3] https://issues.apache.org/jira/browse/KAFKA-3745
> >>>>
> >>>>
> >>>> Cheers,
> >>>> Jeyhun
> >>>>
> >>>>
> >>>> On Fri, May 19, 2017 at 10:42 AM Damian Guy <damian....@gmail.com>
> >>> wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> I see you've removed the `ProcessorContext` from the RichFunction
> which
> >>> is
> >>>>> good, but why is it a `RichFunction`? I'd have expected it to pass
> some
> >>>>> additional contextual information, i.e., the `RecordContext` that
> >>> contains
> >>>>> just the topic, partition, timestamp, offset.  I'm ok with it not
> >>> passing
> >>>>> this contextual information, but is the naming incorrect? I'm not
> sure,
> >>>>> tbh. I'm wondering if we should drop `RichFunctions` until we can do
> it
> >>>>> properly with the correct context?
> >>>>>
> >>>>> Also, i don't like the abstract classes: RichValueMapper,
> >>> RichValueJoiner,
> >>>>> RichInitializer etc. Why can't they not just be interfaces?
> Generally we
> >>>>> should provide users with Intefaces to code against, not classes.
> >>>>>
> >>>>> Thanks,
> >>>>> Damian
> >>>>>
> >>>>> On Fri, 19 May 2017 at 00:50 Jeyhun Karimov <je.kari...@gmail.com>
> >>> wrote:
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> Thanks. I initiated the PR as well, to get a better overview.
> >>>>>>
> >>>>>> The only reason that I used abstract class instead of interface for
> >>> Rich
> >>>>>> functions is that in future if we will have some
> AbstractRichFunction
> >>>>>> abstract classes,
> >>>>>> we can easily extend:
> >>>>>>
> >>>>>> public abstract class RichValueMapper<K, V, VR>  implements
> >>>>>> ValueMapperWithKey<K, V, VR>, RichFunction *extends
> >>>>> AbstractRichFunction*{
> >>>>>> }
> >>>>>>  With interfaces we are only limited to interfaces for inheritance.
> >>>>>>
> >>>>>> However, I think we can easily change it (from abstract class ->
> >>>>> interface)
> >>>>>> if you think interface is a better fit.
> >>>>>>
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Jeyhun
> >>>>>>
> >>>>>>
> >>>>>> On Fri, May 19, 2017 at 1:00 AM Matthias J. Sax <
> matth...@confluent.io
> >>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Thanks for the update and explanations. The KIP is quite improved
> now
> >>>>> --
> >>>>>>> great job!
> >>>>>>>
> >>>>>>> One more question: Why are RichValueMapper etc abstract classes and
> >>> not
> >>>>>>> interfaces?
> >>>>>>>
> >>>>>>>
> >>>>>>> Overall, I like the KIP a lot!
> >>>>>>>
> >>>>>>>
> >>>>>>> -Matthias
> >>>>>>>
> >>>>>>>
> >>>>>>> On 5/16/17 7:03 AM, Jeyhun Karimov wrote:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> Thanks for your comments.
> >>>>>>>>
> >>>>>>>> I think supporting Lambdas for `withKey` and
> `AbstractRichFunction`
> >>>>>>>>> don't go together, as Lambdas are only supported for interfaces
> >>>>> AFAIK.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Maybe I misunderstood your comment.
> >>>>>>>> *withKey* and and *withOnlyValue* are interfaces. So they don't
> have
> >>>>>>> direct
> >>>>>>>> relation with *AbstractRichFunction*.
> >>>>>>>> *withKey* and and *withOnlyValue* interfaces have only one
> method ,
> >>>>> so
> >>>>>>> we
> >>>>>>>> can use lambdas.
> >>>>>>>> Where does the *AbstractRichFunction* comes to play? Inside Rich
> >>>>>>> functions.
> >>>>>>>> And we use Rich functions in 2 places:
> >>>>>>>>
> >>>>>>>> 1. User doesn't use rich functions. Just regular *withKey* and and
> >>>>>>>> *withOnlyValue* interfaces(both support lambdas) . We get those
> >>>>>>> interfaces
> >>>>>>>> and wrap into Rich function while building the topology, and send
> to
> >>>>>>>> Processor.
> >>>>>>>> 2. User does use rich functions (Rich functions implement
> *withKey*
> >>>>>>>> interface). As a result no lamdas here by definition. In this
> case,
> >>>>>> while
> >>>>>>>> building the topology we do a type check if the object type is
> >>>>>>>> *withKey* or *RichFunction*.
> >>>>>>>>
> >>>>>>>> So *AbstractRichFunction* is just syntactic sugar for Rich
> functions
> >>>>>> and
> >>>>>>>> does not affect using lambdas.
> >>>>>>>>
> >>>>>>>> Thus, if we want to support Lambdas for `withKey`, we need to
> have a
> >>>>>>>>> interface approach like this
> >>>>>>>>>   - RichFunction -> only adding init() and close()
> >>>>>>>>>   - ValueMapper
> >>>>>>>>>   - ValueMapperWithKey
> >>>>>>>>>   - RichValueMapper extends ValueMapperWithKey, RichFunction
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> As I said above, currently we support lambdas for *withKey*
> >>>>> interfaces
> >>>>>> as
> >>>>>>>> well.  However, I agree with your idea and I will remove the
> >>>>>>>> AbstractRichFunction from the design.
> >>>>>>>>
> >>>>>>>> As an alternative, we could argue, that it is sufficient to
> support
> >>>>>>>>> Lambdas for the "plain" API only, but not for any "extended API".
> >>>>> For
> >>>>>>>>> this, RichFunction could add key+init+close and
> AbstractRichFunction
> >>>>>>>>> would allow to only care about getting the key.
> >>>>>>>>> Not sure, which one is better. I don't like the idea of more
> >>>>>> overloaded
> >>>>>>>>> methods to get Lambdas for `withKey` interfaces too much because
> we
> >>>>>> have
> >>>>>>>>> already so many overlaods. On the other hand, I do see value in
> >>>>>>>>> supporting Lambdas for `withKey`.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Just to clarify, with current design we have only one extra
> >>>>> overloaded
> >>>>>>>> method per *withOnlyValue* interface:  which is *withKey* version
> of
> >>>>>>>> particular interface.
> >>>>>>>> We don't need extra overload for Rich function as Rich function
> >>>>>>> implements
> >>>>>>>> *withKey* interface as a result they have same type. We
> differentiate
> >>>>>>> them
> >>>>>>>> while building the topology.
> >>>>>>>> We supported lambdas for *withKey* APIs because of the comment:
> >>>>>>>>
> >>>>>>>> @Jeyhun: I did not put any thought into this, but can we have a
> >>>>> design
> >>>>>>>>> that allows for both? Also, with regard to lambdas, it might make
> >>>>>> sense
> >>>>>>>>> to allow for both `V -> newV` and `(K, V) -> newV` ?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> However, I don't think that this complicates the overall design
> >>>>>>>> significantly.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Depending on what we want to support, it might make sense to
> >>>>>>>>> include/exclude RichFunctions from this KIP -- and thus, this
> also
> >>>>>>>>> determines if we should have a "ProcessorContext KIP" before
> driving
> >>>>>>>>> this KIP further.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Based on our discussion I think we should keep Rich functions as I
> >>>>>> don't
> >>>>>>>> think that they bring extra layer of overhead to library.
> >>>>>>>>
> >>>>>>>> Any comments are appreciated.
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>> Jeyhun
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Tue, May 16, 2017 at 12:10 AM Matthias J. Sax <
> >>>>>> matth...@confluent.io>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Jeyhun,
> >>>>>>>>>
> >>>>>>>>> thanks for the update.
> >>>>>>>>>
> >>>>>>>>> I think supporting Lambdas for `withKey` and
> `AbstractRichFunction`
> >>>>>>>>> don't go together, as Lambdas are only supported for interfaces
> >>>>> AFAIK.
> >>>>>>>>>
> >>>>>>>>> Thus, if we want to support Lambdas for `withKey`, we need to
> have a
> >>>>>>>>> interface approach like this
> >>>>>>>>>
> >>>>>>>>>   - RichFunction -> only adding init() and close()
> >>>>>>>>>
> >>>>>>>>>   - ValueMapper
> >>>>>>>>>   - ValueMapperWithKey
> >>>>>>>>>
> >>>>>>>>>   - RichValueMapper extends ValueMapperWithKey, RichFunction
> >>>>>>>>>
> >>>>>>>>> For this approach, AbstractRichFunction does not make sense
> anymore,
> >>>>>> as
> >>>>>>>>> the only purpose of `RichFunction` is to allow the
> implementation of
> >>>>>>>>> init() and close() -- if you don't want those, you would
> implement a
> >>>>>>>>> different interface (ie, ValueMapperWithKey)
> >>>>>>>>>
> >>>>>>>>> As an alternative, we could argue, that it is sufficient to
> support
> >>>>>>>>> Lambdas for the "plain" API only, but not for any "extended API".
> >>>>> For
> >>>>>>>>> this, RichFunction could add key+init+close and
> AbstractRichFunction
> >>>>>>>>> would allow to only care about getting the key.
> >>>>>>>>>
> >>>>>>>>> Not sure, which one is better. I don't like the idea of more
> >>>>>> overloaded
> >>>>>>>>> methods to get Lambdas for `withKey` interfaces too much because
> we
> >>>>>> have
> >>>>>>>>> already so many overlaods. On the other hand, I do see value in
> >>>>>>>>> supporting Lambdas for `withKey`.
> >>>>>>>>>
> >>>>>>>>> Depending on what we want to support, it might make sense to
> >>>>>>>>> include/exclude RichFunctions from this KIP -- and thus, this
> also
> >>>>>>>>> determines if we should have a "ProcessorContext KIP" before
> driving
> >>>>>>>>> this KIP further.
> >>>>>>>>>
> >>>>>>>>> Thoughts?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> -Matthias
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 5/15/17 11:01 AM, Jeyhun Karimov wrote:
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> Sorry for super late response. Thanks for your comments.
> >>>>>>>>>>
> >>>>>>>>>> I am not an expert on Lambdas. Can you elaborate a little bit? I
> >>>>>> cannot
> >>>>>>>>>>> follow the explanation in the KIP to see what the problem is.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> - From [1] says "A functional interface is an interface that has
> >>>>> just
> >>>>>>> one
> >>>>>>>>>> abstract method, and thus represents a single function
> contract".
> >>>>>>>>>> So basically once we extend some interface from another (in our
> >>>>> case,
> >>>>>>>>>> ValueMapperWithKey from ValueMapper) we cannot use lambdas in
> the
> >>>>>>>>> extended
> >>>>>>>>>> interface.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Further comments:
> >>>>>>>>>>>  - The KIP get a little hard to read -- can you maybe reformat
> the
> >>>>>>> wiki
> >>>>>>>>>>> page a little bit? I think using `CodeBlock` would help.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> - I will work on the KIP.
> >>>>>>>>>>
> >>>>>>>>>>  - What about KStream-KTable joins? You don't have overlaods
> added
> >>>>>> for
> >>>>>>>>>>> them. Why? (Even if I still hope that we don't need to add any
> new
> >>>>>>>>>>> overloads)
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> - Actually there are more than one Processor and public APIs to
> be
> >>>>>>>>>> changed (KStream-KTable
> >>>>>>>>>> joins is one case). However all of them has similar structure:
> we
> >>>>>>>>> overload
> >>>>>>>>>> the *method* with  *methodWithKey*,
> >>>>>>>>>> wrap it into the Rich function, send to processor and inside the
> >>>>>>>>> processor
> >>>>>>>>>> call *init* and *close* methods of the Rich function.
> >>>>>>>>>> As I wrote in KIP, I wanted to demonstrate the overall idea with
> >>>>> only
> >>>>>>>>>> *ValueMapper* as the same can be applied to all changes.
> >>>>>>>>>> Anyway I will update the KIP.
> >>>>>>>>>>
> >>>>>>>>>>  - Why do we need `AbstractRichFunction`?
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Instead of overriding the *init(ProcessorContext p)* and*
> close()*
> >>>>>>>>> methods
> >>>>>>>>>> in every Rich function with empty body like:
> >>>>>>>>>>
> >>>>>>>>>> @Override
> >>>>>>>>>> void init(ProcessorContext context) {}
> >>>>>>>>>>
> >>>>>>>>>> @Override
> >>>>>>>>>> void close () {}
> >>>>>>>>>>
> >>>>>>>>>> I thought that we can override them once in
> *AbstractRichFunction*
> >>>>>> and
> >>>>>>>>>> extent new Rich functions from *AbstractRichFunction*.
> >>>>>>>>>> Basically this can eliminate code copy-paste and ease the
> >>>>>> maintenance.
> >>>>>>>>>>
> >>>>>>>>>>  - What about interfaces Initializer, ForeachAction, Merger,
> >>>>>> Predicate,
> >>>>>>>>>>> Reducer? I don't want to say we should/need to add to all, but
> we
> >>>>>>> should
> >>>>>>>>>>> discuss all of them and add where it does make sense (e.g.,
> >>>>>>>>>>> RichForachAction does make sense IMHO)
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Definitely agree. As I said, the same technique applies to all
> this
> >>>>>>>>>> interfaces and I didn't want to explode the KIP, just wanted to
> >>>>> give
> >>>>>>> the
> >>>>>>>>>> overall intuition.
> >>>>>>>>>> However, I will update the KIP as I said.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Btw: I like the hierarchy `ValueXX` -- `ValueXXWithKey` --
> >>>>>>> `RichValueXX`
> >>>>>>>>>>> in general -- but why can't we do all this with interfaces
> only?
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Sure we can. However the main intuition is we should not force
> >>>>> users
> >>>>>> to
> >>>>>>>>>> implement *init(ProcessorContext)* and *close()* functions every
> >>>>> time
> >>>>>>>>> they
> >>>>>>>>>> use Rich functions.
> >>>>>>>>>> If one needs, she can override the respective methods. However,
> I
> >>>>> am
> >>>>>>> open
> >>>>>>>>>> for discussion.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> I'd rather not see the use of  `ProcessorContext` spread any
> >>>>> further
> >>>>>>> than
> >>>>>>>>>>> it currently is. So maybe we need another KIP that is done
> before
> >>>>>>> this?
> >>>>>>>>>>> Otherwise i think the scope of this KIP is becoming too large.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> That is good point. I wanted to make *init(ProcessorContext)*
> >>>>> method
> >>>>>>>>>> persistent among the library (which use ProcessorContext as an
> >>>>>> input),
> >>>>>>>>>> therefore I put *ProcessorContext* as an input.
> >>>>>>>>>> So the important question is that (as @dguy and @mjsax
> mentioned)
> >>>>>>> whether
> >>>>>>>>>> continue this KIP without providing users an access to
> >>>>>>> *ProcessorContext*
> >>>>>>>>>> (change *init (ProcessorContext)* to * init()* ) or
> >>>>>>>>>> initiate another KIP before this.
> >>>>>>>>>>
> >>>>>>>>>> [1]
> >>>>>>>>>> http://cr.openjdk.java.net/~mr/se/8/java-se-8-pfd-spec/java-
> >>>>>>>>> se-8-jls-pfd-diffs.pdf
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Cheers,
> >>>>>>>>>> Jeyhun
> >>>>>>>>>>
> >>>>>>>>>> On Mon, May 15, 2017 at 7:15 PM, Damian Guy <
> damian....@gmail.com>
> >>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> I'd rather not see the use of  `ProcessorContext` spread any
> >>>>> further
> >>>>>>>>> than
> >>>>>>>>>>> it currently is. So maybe we need another KIP that is done
> before
> >>>>>>> this?
> >>>>>>>>>>> Otherwise i think the scope of this KIP is becoming too large.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, 15 May 2017 at 18:06 Matthias J. Sax <
> >>>>> matth...@confluent.io
> >>>>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> I agree that that `ProcessorContext` interface is too broad in
> >>>>>>> general
> >>>>>>>>>>>> -- this is even true for transform/process, and it's also
> >>>>> reflected
> >>>>>>> in
> >>>>>>>>>>>> the API improvement list we want to do.
> >>>>>>>>>>>>
> >>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/
> >>>>>>>>>>> Kafka+Streams+Discussions
> >>>>>>>>>>>>
> >>>>>>>>>>>> So I am wondering, if you question the `RichFunction`
> approach in
> >>>>>>>>>>>> general? Or if you suggest to either extend the scope of this
> KIP
> >>>>>> to
> >>>>>>>>>>>> include this---or maybe better, do another KIP for it and
> delay
> >>>>>> this
> >>>>>>>>> KIP
> >>>>>>>>>>>> until the other one is done?
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 5/15/17 2:35 AM, Damian Guy wrote:
> >>>>>>>>>>>>> Thanks for the KIP.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I'm not convinced on the `RichFunction` approach. Do we
> really
> >>>>>> want
> >>>>>>> to
> >>>>>>>>>>>> give
> >>>>>>>>>>>>> every DSL method access to the `ProcessorContext` ? It has a
> >>>>> bunch
> >>>>>>> of
> >>>>>>>>>>>>> methods on it that seem in-appropriate for some of the DSL
> >>>>>> methods,
> >>>>>>>>>>> i.e,
> >>>>>>>>>>>>> `register`, `getStateStore`, `forward`, `schedule` etc. It is
> >>>>> far
> >>>>>>> too
> >>>>>>>>>>>>> broad. I think it would be better to have a narrower
> interface
> >>>>>> like
> >>>>>>>>> the
> >>>>>>>>>>>>> `RecordContext`  - remembering it is easier to add
> >>>>>>> methods/interfaces
> >>>>>>>>>>>> later
> >>>>>>>>>>>>> than to remove them
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Sat, 13 May 2017 at 22:26 Matthias J. Sax <
> >>>>>> matth...@confluent.io
> >>>>>>>>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Jeyhun,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I am not an expert on Lambdas. Can you elaborate a little
> bit?
> >>>>> I
> >>>>>>>>>>> cannot
> >>>>>>>>>>>>>> follow the explanation in the KIP to see what the problem
> is.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> For updating the KIP title I don't know -- guess it's up to
> >>>>> you.
> >>>>>>>>>>> Maybe a
> >>>>>>>>>>>>>> committer can comment on this?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Further comments:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>  - The KIP get a little hard to read -- can you maybe
> reformat
> >>>>>> the
> >>>>>>>>>>> wiki
> >>>>>>>>>>>>>> page a little bit? I think using `CodeBlock` would help.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>  - What about KStream-KTable joins? You don't have overlaods
> >>>>>> added
> >>>>>>>>> for
> >>>>>>>>>>>>>> them. Why? (Even if I still hope that we don't need to add
> any
> >>>>>> new
> >>>>>>>>>>>>>> overloads)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>  - Why do we need `AbstractRichFunction`?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>  - What about interfaces Initializer, ForeachAction, Merger,
> >>>>>>>>>>> Predicate,
> >>>>>>>>>>>>>> Reducer? I don't want to say we should/need to add to all,
> but
> >>>>> we
> >>>>>>>>>>> should
> >>>>>>>>>>>>>> discuss all of them and add where it does make sense (e.g.,
> >>>>>>>>>>>>>> RichForachAction does make sense IMHO)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Btw: I like the hierarchy `ValueXX` -- `ValueXXWithKey` --
> >>>>>>>>>>> `RichValueXX`
> >>>>>>>>>>>>>> in general -- but why can't we do all this with interfaces
> >>>>> only?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 5/11/17 7:00 AM, Jeyhun Karimov wrote:
> >>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks for your comments. I think we cannot extend the two
> >>>>>>>>> interfaces
> >>>>>>>>>>>> if
> >>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>> want to keep lambdas. I updated the KIP [1]. Maybe I should
> >>>>>> change
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>> title, because now we are not limiting the KIP to only
> >>>>>>> ValueMapper,
> >>>>>>>>>>>>>>> ValueTransformer and ValueJoiner.
> >>>>>>>>>>>>>>> Please feel free to comment.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>>>>>>>>> 149%3A+Enabling+key+access+in+ValueTransformer%2C+
> >>>>>>>>>>> ValueMapper%2C+and+ValueJoiner
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>> Jeyhun
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Tue, May 9, 2017 at 7:36 PM Matthias J. Sax <
> >>>>>>>>>>> matth...@confluent.io>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> If `ValueMapperWithKey` extends `ValueMapper` we don't
> need
> >>>>> the
> >>>>>>> new
> >>>>>>>>>>>>>>>> overlaod.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> And yes, we need to do one check -- but this happens when
> >>>>>>> building
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>>> topology. At runtime (I mean after KafkaStream#start() we
> >>>>> don't
> >>>>>>>>> need
> >>>>>>>>>>>> any
> >>>>>>>>>>>>>>>> check as we will always use `ValueMapperWithKey`)
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On 5/9/17 2:55 AM, Jeyhun Karimov wrote:
> >>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Thanks for feedback.
> >>>>>>>>>>>>>>>>> Then we need to overload method
> >>>>>>>>>>>>>>>>>   <VR> KStream<K, VR> mapValues(ValueMapper<? super V, ?
> >>>>>> extends
> >>>>>>>>>>> VR>
> >>>>>>>>>>>>>>>>> mapper);
> >>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>   <VR> KStream<K, VR> mapValues(ValueMapperWithKey<?
> super
> >>>>> V,
> >>>>>> ?
> >>>>>>>>>>>> extends
> >>>>>>>>>>>>>>>> VR>
> >>>>>>>>>>>>>>>>> mapper);
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> and in runtime (inside processor) we still have to check
> it
> >>>>> is
> >>>>>>>>>>>>>>>> ValueMapper
> >>>>>>>>>>>>>>>>> or ValueMapperWithKey before wrapping it into the rich
> >>>>>> function.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Please correct me if I am wrong.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>> Jeyhun
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Tue, May 9, 2017 at 10:56 AM Michal Borowiecki <
> >>>>>>>>>>>>>>>>> michal.borowie...@openbet.com> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> +1 :)
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On 08/05/17 23:52, Matthias J. Sax wrote:
> >>>>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I was reading the updated KIP and I am wondering, if we
> >>>>>> should
> >>>>>>>>> do
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> design a little different.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Instead of distinguishing between a RichFunction and
> >>>>>>>>>>>> non-RichFunction
> >>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>> runtime level, we would use RichFunctions all the time.
> >>>>>> Thus,
> >>>>>>> on
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>>> DSL
> >>>>>>>>>>>>>>>>>>> entry level, if a user provides a non-RichFunction, we
> >>>>> wrap
> >>>>>> it
> >>>>>>>>>>> by a
> >>>>>>>>>>>>>>>>>>> RichFunction that is fully implemented by Streams. This
> >>>>>>>>>>>> RichFunction
> >>>>>>>>>>>>>>>>>>> would just forward the call omitting the key:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Just to sketch the idea (incomplete code snippet):
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> public StreamsRichValueMapper implements
> >>>>> RichValueMapper()
> >>>>>> {
> >>>>>>>>>>>>>>>>>>>>    private ValueMapper userProvidedMapper; // set by
> >>>>>>>>> constructor
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>    public VR apply(final K key, final V1 value1,
> final V2
> >>>>>>>>>>> value2)
> >>>>>>>>>>>> {
> >>>>>>>>>>>>>>>>>>>>        return userProvidedMapper(value1, value2);
> >>>>>>>>>>>>>>>>>>>>    }
> >>>>>>>>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>  From a performance point of view, I am not sure if the
> >>>>>>>>>>>>>>>>>>> "if(isRichFunction)" including casts etc would have
> more
> >>>>>>>>> overhead
> >>>>>>>>>>>>>> than
> >>>>>>>>>>>>>>>>>>> this approach (we would do more nested method call for
> >>>>>>>>>>>>>> non-RichFunction
> >>>>>>>>>>>>>>>>>>> which should be more common than RichFunctions).
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> This approach should not effect lambdas (or do I miss
> >>>>>>>>> something?)
> >>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>> might be cleaner, as we could have one more top level
> >>>>>>> interface
> >>>>>>>>>>>>>>>>>>> `RichFunction` with methods `init()` and `close()` and
> >>>>> also
> >>>>>>>>>>>>>> interfaces
> >>>>>>>>>>>>>>>>>>> for `RichValueMapper` etc. (thus, no abstract classes
> >>>>>>> required).
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Any thoughts?
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On 5/6/17 5:29 PM, Jeyhun Karimov wrote:
> >>>>>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Thanks for comments. I extended PR and KIP to include
> >>>>> rich
> >>>>>>>>>>>>>> functions.
> >>>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>>>> will still have to evaluate the cost of deep copying
> of
> >>>>>> keys.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>>>> Jeyhun
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Fri, May 5, 2017 at 8:02 PM Mathieu Fenniak <
> >>>>>>>>>>>>>>>>>> mathieu.fenn...@replicon.com>
> >>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Hey Matthias,
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> My opinion would be that documenting the
> immutability of
> >>>>>> the
> >>>>>>>>>>> key
> >>>>>>>>>>>> is
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> best approach available.  Better than requiring the
> key
> >>>>> to
> >>>>>>> be
> >>>>>>>>>>>>>>>>>> serializable
> >>>>>>>>>>>>>>>>>>>>> (as with Jeyhun's last pass at the PR), no
> performance
> >>>>>> risk.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> It'd be different if Java had immutable type
> constraints
> >>>>>> of
> >>>>>>>>>>> some
> >>>>>>>>>>>>>>>> kind.
> >>>>>>>>>>>>>>>>>> :-)
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Mathieu
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Fri, May 5, 2017 at 11:31 AM, Matthias J. Sax <
> >>>>>>>>>>>>>>>>>> matth...@confluent.io>
> >>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Agreed about RichFunction. If we follow this path,
> it
> >>>>>>> should
> >>>>>>>>>>>> cover
> >>>>>>>>>>>>>>>>>>>>>> all(?) DSL interfaces.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> About guarding the key -- I am still not sure what
> to
> >>>>> do
> >>>>>>>>> about
> >>>>>>>>>>>>>> it...
> >>>>>>>>>>>>>>>>>>>>>> Maybe it might be enough to document it (and name
> the
> >>>>> key
> >>>>>>>>>>>>>> parameter
> >>>>>>>>>>>>>>>>>> like
> >>>>>>>>>>>>>>>>>>>>>> `readOnlyKey` to make is very clear). Ultimately, I
> >>>>> would
> >>>>>>>>>>> prefer
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>> guard against any modification, but I have no good
> idea
> >>>>>> how
> >>>>>>>>> to
> >>>>>>>>>>>> do
> >>>>>>>>>>>>>>>>>> this.
> >>>>>>>>>>>>>>>>>>>>>> Not sure what others think about the risk of
> corrupted
> >>>>>>>>>>>>>> partitioning
> >>>>>>>>>>>>>>>>>>>>>> (what would be a user error and we could say, well,
> bad
> >>>>>>> luck,
> >>>>>>>>>>>> you
> >>>>>>>>>>>>>>>> got
> >>>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>>> bug in your code, that's not our fault), vs deep
> copy
> >>>>>> with
> >>>>>>> a
> >>>>>>>>>>>>>>>> potential
> >>>>>>>>>>>>>>>>>>>>>> performance hit (that we can't quantity atm without
> any
> >>>>>>>>>>>>>> performance
> >>>>>>>>>>>>>>>>>>>>> test).
> >>>>>>>>>>>>>>>>>>>>>> We do have a performance system test. Maybe it's
> worth
> >>>>>> for
> >>>>>>>>> you
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> apply
> >>>>>>>>>>>>>>>>>>>>>> the deep copy strategy and run the test. It's very
> >>>>> basic
> >>>>>>>>>>>>>> performance
> >>>>>>>>>>>>>>>>>>>>>> test only, but might give some insight. If you want
> to
> >>>>> do
> >>>>>>>>>>> this,
> >>>>>>>>>>>>>> look
> >>>>>>>>>>>>>>>>>>>>>> into folder "tests" for general test setup, and into
> >>>>>>>>>>>>>>>>>>>>>> "tests/kafaktests/benchmarks/streams" to find find
> the
> >>>>>> perf
> >>>>>>>>>>>> test.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> On 5/5/17 8:55 AM, Jeyhun Karimov wrote:
> >>>>>>>>>>>>>>>>>>>>>>> Hi Matthias,
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> I think extending KIP to include RichFunctions
> totally
> >>>>>>>>> makes
> >>>>>>>>>>>>>>>> sense.
> >>>>>>>>>>>>>>>>>>>>> So,
> >>>>>>>>>>>>>>>>>>>>>>>   we don't want to guard the keys because it is
> >>>>> costly.
> >>>>>>>>>>>>>>>>>>>>>>> If we introduce RichFunctions I think it should
> not be
> >>>>>>>>>>> limited
> >>>>>>>>>>>>>> only
> >>>>>>>>>>>>>>>>>>>>> the 3
> >>>>>>>>>>>>>>>>>>>>>>> interfaces proposed in KIP but to wide range of
> >>>>>>> interfaces.
> >>>>>>>>>>>>>>>>>>>>>>> Please correct me if I am wrong.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>>>>>>> Jeyhun
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On Fri, May 5, 2017 at 12:04 AM Matthias J. Sax <
> >>>>>>>>>>>>>>>>>> matth...@confluent.io
> >>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> One follow up. There was this email on the user
> list:
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>> http://search-hadoop.com/m/Kafka/uyzND17KhCaBzPSZ1?subj=
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>> Shouldn+t+the+initializer+of+a+stream+aggregate+accept+the+
> >>>>>>>>>>> key+
> >>>>>>>>>>>>>>>>>>>>>>>> It might make sense so include Initializer, Adder,
> >>>>> and
> >>>>>>>>>>>>>> Substractor
> >>>>>>>>>>>>>>>>>>>>>>>> inferface, too.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> And we should double check if there are other
> >>>>> interface
> >>>>>>> we
> >>>>>>>>>>>> might
> >>>>>>>>>>>>>>>>>> miss
> >>>>>>>>>>>>>>>>>>>>>> atm.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> On 5/4/17 1:31 PM, Matthias J. Sax wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>> Thanks for updating the KIP.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Deep copying the key will work for sure, but I am
> >>>>>>> actually
> >>>>>>>>>>> a
> >>>>>>>>>>>>>>>> little
> >>>>>>>>>>>>>>>>>>>>> bit
> >>>>>>>>>>>>>>>>>>>>>>>>> worried about performance impact... We might
> want to
> >>>>>> do
> >>>>>>>>>>> some
> >>>>>>>>>>>>>> test
> >>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>> quantify this impact.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Btw: this remind me about the idea of
> `RichFunction`
> >>>>>>>>>>>> interface
> >>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>> would allow users to access record metadata (like
> >>>>>>>>>>> timestamp,
> >>>>>>>>>>>>>>>>>> offset,
> >>>>>>>>>>>>>>>>>>>>>>>>> partition etc) within DSL. This would be a
> similar
> >>>>>>>>> concept.
> >>>>>>>>>>>>>>>> Thus, I
> >>>>>>>>>>>>>>>>>>>>> am
> >>>>>>>>>>>>>>>>>>>>>>>>> wondering, if it would make sense to enlarge the
> >>>>> scope
> >>>>>>> of
> >>>>>>>>>>>> this
> >>>>>>>>>>>>>>>> KIP
> >>>>>>>>>>>>>>>>>> by
> >>>>>>>>>>>>>>>>>>>>>>>>> that? WDYT?
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> On 5/3/17 2:08 AM, Jeyhun Karimov wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>> Hi Mathieu,
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for feedback. I followed similar approach
> >>>>> and
> >>>>>>>>>>> updated
> >>>>>>>>>>>>>> PR
> >>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>> KIP
> >>>>>>>>>>>>>>>>>>>>>>>>>> accordingly. I tried to guard the key in
> Processors
> >>>>>>>>>>> sending
> >>>>>>>>>>>> a
> >>>>>>>>>>>>>>>> copy
> >>>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>>>>>>>>> actual key.
> >>>>>>>>>>>>>>>>>>>>>>>>>> Because I am doing deep copy of an object, I
> think
> >>>>>>> memory
> >>>>>>>>>>>> can
> >>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>>>> bottleneck
> >>>>>>>>>>>>>>>>>>>>>>>>>> in some use-cases.
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>>>>>>>>>> Jeyhun
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, May 2, 2017 at 5:10 PM Mathieu Fenniak <
> >>>>>>>>>>>>>>>>>>>>>>>> mathieu.fenn...@replicon.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jeyhun,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> This approach would change ValueMapper
> (...etc) to
> >>>>>> be
> >>>>>>>>>>>>>> classes,
> >>>>>>>>>>>>>>>>>>>>> rather
> >>>>>>>>>>>>>>>>>>>>>>>> than
> >>>>>>>>>>>>>>>>>>>>>>>>>>> interfaces, which is also a backwards
> incompatible
> >>>>>>>>>>> change.
> >>>>>>>>>>>>>> An
> >>>>>>>>>>>>>>>>>>>>>>>> alternative
> >>>>>>>>>>>>>>>>>>>>>>>>>>> approach that would be backwards compatible
> would
> >>>>> be
> >>>>>>> to
> >>>>>>>>>>>>>> define
> >>>>>>>>>>>>>>>>>> new
> >>>>>>>>>>>>>>>>>>>>>>>>>>> interfaces, and provide overrides where those
> >>>>>>> interfaces
> >>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>> used.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Unfortunately, making the key parameter as
> "final"
> >>>>>>>>>>> doesn't
> >>>>>>>>>>>>>>>> change
> >>>>>>>>>>>>>>>>>>>>>> much
> >>>>>>>>>>>>>>>>>>>>>>>>>>> about guarding against key change.  It only
> >>>>> prevents
> >>>>>>> the
> >>>>>>>>>>>>>>>>>> parameter
> >>>>>>>>>>>>>>>>>>>>>>>> variable
> >>>>>>>>>>>>>>>>>>>>>>>>>>> from being reassigned.  If the key type is a
> >>>>> mutable
> >>>>>>>>>>> object
> >>>>>>>>>>>>>>>> (eg.
> >>>>>>>>>>>>>>>>>>>>>>>> byte[]),
> >>>>>>>>>>>>>>>>>>>>>>>>>>> it can still be mutated. (eg. key[0] = 0).  But
> >>>>> I'm
> >>>>>>> not
> >>>>>>>>>>>>>> really
> >>>>>>>>>>>>>>>>>> sure
> >>>>>>>>>>>>>>>>>>>>>>>> there's
> >>>>>>>>>>>>>>>>>>>>>>>>>>> much that can be done about that.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Mathieu
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, May 1, 2017 at 5:39 PM, Jeyhun Karimov
> <
> >>>>>>>>>>>>>>>>>>>>> je.kari...@gmail.com
> >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for comments.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> The concerns makes sense. Although we can
> guard
> >>>>> for
> >>>>>>>>>>>>>> immutable
> >>>>>>>>>>>>>>>>>> keys
> >>>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> current implementation (with few changes), I
> >>>>> didn't
> >>>>>>>>>>>> consider
> >>>>>>>>>>>>>>>>>>>>>> backward
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> compatibility.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> In this case 2 solutions come to my mind. In
> both
> >>>>>>>>> cases,
> >>>>>>>>>>>>>> user
> >>>>>>>>>>>>>>>>>>>>>> accesses
> >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> key in Object type, as passing extra type
> >>>>> parameter
> >>>>>>>>> will
> >>>>>>>>>>>>>> break
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> backwards-compatibility.  So user has to cast
> to
> >>>>>>> actual
> >>>>>>>>>>>> key
> >>>>>>>>>>>>>>>>>> type.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. Firstly, We can overload apply method with
> 2
> >>>>>>>>> argument
> >>>>>>>>>>>>>> (key
> >>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>> value)
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> and force key to be *final*. By doing this,  I
> >>>>>> think
> >>>>>>> we
> >>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>>> address
> >>>>>>>>>>>>>>>>>>>>>>>> both
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> backward-compatibility and guarding against
> key
> >>>>>>> change.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Secondly, we can create class KeyAccess
> like:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> public class KeyAccess {
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>      Object key;
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>      public void beforeApply(final Object
> key) {
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>          this.key = key;
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>      }
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>      public Object getKey() {
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>          return key;
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>      }
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> We can extend *ValueMapper, ValueJoiner* and
> >>>>>>>>>>>>>>>> *ValueTransformer*
> >>>>>>>>>>>>>>>>>>>>> from
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> *KeyAccess*. Inside processor (for example
> >>>>>>>>>>>>>>>>>>>>>> *KTableMapValuesProcessor*)
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> before calling *mapper.apply(value)* we can
> set
> >>>>> the
> >>>>>>>>>>> *key*
> >>>>>>>>>>>> by
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> *mapper.beforeApply(key)*. As a result, user
> can
> >>>>>> use
> >>>>>>>>>>>>>>>> *getKey()*
> >>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>> access
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the key inside *apply(value)* method.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeyhun
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, May 1, 2017 at 7:24 PM Matthias J.
> Sax <
> >>>>>>>>>>>>>>>>>>>>>> matth...@confluent.io
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeyhun,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> thanks a lot for the KIP!
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think there are two issues we need to
> address:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> (1) The KIP does not consider backward
> >>>>>>> compatibility.
> >>>>>>>>>>>> Users
> >>>>>>>>>>>>>>>> did
> >>>>>>>>>>>>>>>>>>>>>>>>>>> complain
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> about this in past releases already, and as
> the
> >>>>>> user
> >>>>>>>>>>> base
> >>>>>>>>>>>>>>>>>> grows,
> >>>>>>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> should not break backward compatibility in
> >>>>> future
> >>>>>>>>>>>> releases
> >>>>>>>>>>>>>>>>>>>>> anymore.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thus, we should think of a better way to
> allow
> >>>>> key
> >>>>>>>>>>>> access.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Mathieu's comment goes into the same
> direction
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On the other hand, the number of compile
> >>>>>> failures
> >>>>>>>>>>> that
> >>>>>>>>>>>>>>>> would
> >>>>>>>>>>>>>>>>>>>>> need
> >>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fixed from this change is unfortunate. :-)
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Another concern is, that there is no
> guard
> >>>>> to
> >>>>>>>>>>> prevent
> >>>>>>>>>>>>>>>> user
> >>>>>>>>>>>>>>>>>>>>> code
> >>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> modify the key. This might corrupt
> partitioning
> >>>>> if
> >>>>>>>>>>> users
> >>>>>>>>>>>> do
> >>>>>>>>>>>>>>>>>> alter
> >>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> key (accidentally -- or users are just not
> aware
> >>>>>>> that
> >>>>>>>>>>>> they
> >>>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> allowed to modify the provided key object)
> and
> >>>>>> thus
> >>>>>>>>>>> break
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> application. (This was the original
> motivation
> >>>>> to
> >>>>>>> not
> >>>>>>>>>>>>>> provide
> >>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> key
> >>>>>>>>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the first place -- it's guards against
> >>>>>>> modification.)
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 5/1/17 6:31 AM, Mathieu Fenniak wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jeyhun,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I just want to add my voice that, I too,
> have
> >>>>>>> wished
> >>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>> access
> >>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> record key during a mapValues or similar
> >>>>>> operation.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On the other hand, the number of compile
> >>>>> failures
> >>>>>>>>> that
> >>>>>>>>>>>>>> would
> >>>>>>>>>>>>>>>>>>>>> need
> >>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fixed from this change is unfortunate. :-)
> But
> >>>>>> at
> >>>>>>>>>>> least
> >>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>> would
> >>>>>>>>>>>>>>>>>>>>>>>> all
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pretty clear and easy change.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Mathieu
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, May 1, 2017 at 6:55 AM, Jeyhun
> Karimov
> >>>>> <
> >>>>>>>>>>>>>>>>>>>>>>>> je.kari...@gmail.com
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Dear community,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I want to share KIP-149 [1] based on issues
> >>>>>>>>>>> KAFKA-4218
> >>>>>>>>>>>>>> [2],
> >>>>>>>>>>>>>>>>>>>>>>>>>>> KAFKA-4726
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> [3],
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> KAFKA-3745 [4]. The related PR can be
> found at
> >>>>>>> [5].
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would like to get your comments.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://cwiki.apache.org/
> >>>>>>>>>>> confluence/display/KAFKA/KIP-
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>> 149%3A+Enabling+key+access+in+ValueTransformer%2C+
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ValueMapper%2C+and+ValueJoiner
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2] https://issues.apache.org/jira
> >>>>>>>>> /browse/KAFKA-4218
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [3] https://issues.apache.org/jira
> >>>>>>>>> /browse/KAFKA-4726
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [4] https://issues.apache.org/jira
> >>>>>>>>> /browse/KAFKA-3745
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [5]
> https://github.com/apache/kafka/pull/2946
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeyhun
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -Cheers
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeyhun
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> -Cheers
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeyhun
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>>> -Cheers
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Jeyhun
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>

-- 
-Cheers

Jeyhun

Re: [DISCUSS]: KIP-149: Enabling key access in ValueTransformer, ValueMapper, and ValueJoiner

Reply via email to