Re: [DISCUSS]: KIP-149: Enabling key access in ValueTransformer, ValueMapper, and ValueJoiner

Guozhang Wang Sun, 04 Jun 2017 18:24:45 -0700

With KIP-159, the added "valueMapperWithKey" and "valueTransformerWithKey"
along seem to be just adding a few syntax-sugars? E.g. we can simply
implement:


mapValues(ValueMapperWithKey mapperWithKey)

as

map((k, v) -> (k, mapperWithKey(k, v))

----------------------

I'm not sure how many users would really need such syntax sugars, and even
they do, we can support them easily as the above implementations;

Similarly for "ReducerWithKey", it can be implemented as `Aggregator<K, V,
V>`, and people who needs key access can just use `aggregate` directly.

The function which I think is really of valuable is `ValueJoinerWithKey`,
and that seems to be the original request from users while others are
coming from "API consistency" concerns. Personally I'd prefer only keep the
last one (`InitializerWithKey` might also have some value, but I have not
seen people widely requesting it in their DSL usage yet; if there is a
common request we can keep it in this KIP as well). WDYT?

Guozhang


On Sun, May 28, 2017 at 10:16 AM, Jeyhun Karimov <je.kari...@gmail.com>
wrote:

> Thanks for clarification Matthias, now everything is clear.
>
> On Sun, May 28, 2017 at 6:21 PM Matthias J. Sax <matth...@confluent.io>
> wrote:
>
> > I don't think we can drop ValueTransformerSupplier. If you don't have an
> > supplier, you only get a single instance of your function. But for a
> > stateful transformation, we need multiple instances (one for each task)
> > of ValueTransformer.
> >
> > We don't need suppliers for functions like "ValueMapper" etc because
> > those are stateless and thus, we can reuse a single instance over
> > multiple tasks -- but we can't do this for ValueTransformer (and
> similar).
> >
> > Btw: This reminds me about KIP-159: with regard to the RichFunction we
> > might need a supplier pattern, too. (I'll comment on the other thread,
> > too.)
> >
> >
> > -Matthias
> >
> > On 5/28/17 5:45 AM, Jeyhun Karimov wrote:
> > > Hi,
> > >
> > > I updated KIP.
> > > Just to avoid misunderstanding, I meant deprecating
> > ValueTransformerSupplier
> > > and I am ok with ValueTransformer.
> > > So instead of using ValueTransformerSupplier can't we directly use
> > > ValueTransformer
> > > or ValueTransformerWithKey?
> > >
> > > Btw, in current design all features of ValueTransformer will be
> available
> > > in  ValueTransformerWithKey interface.
> > >
> > > Cheers,
> > > Jeyhun
> > >
> > > On Sun, May 28, 2017 at 6:15 AM Matthias J. Sax <matth...@confluent.io
> >
> > > wrote:
> > >
> > >> Thanks Jeyhun.
> > >>
> > >> About ValueTransformer: I don't think we can remove it. Note,
> > >> ValueTransformer allows to attach a state and also allows to register
> > >> punctuations. Both those features will not be available via withKey()
> > >> interfaces.
> > >>
> > >> -Matthias
> > >>
> > >>
> > >> On 5/27/17 1:25 PM, Jeyhun Karimov wrote:
> > >>> Hi Matthias,
> > >>>
> > >>> Thanks for your comments.
> > >>>
> > >>> I tested the deep copy approach. It has significant overhead.
> > Especially
> > >>> for "light" and stateless operators it slows down the topology
> > >>> significantly (> 20% ). I think "warning"  users about not-changing
> the
> > >> key
> > >>> is better warning them about possible performance loss.
> > >>>
> > >>> About the interfaces, additionally I considered adding
> > >> InitializerWithKey,
> > >>> AggregatorWithKey and ValueTransformerWithKey. I think they are
> > included
> > >> in
> > >>> PR but not in KIP. I will also include them in KIP, sorry my bad.
> > >>> Including ReducerWithKey definitely makes sense. Thanks.
> > >>>
> > >>> One thing I want to mention is that, maybe we should deprecate
> methods
> > >> with
> > >>> argument type ValueTransformerSupplier (KStream.transformValues(...))
> > and
> > >>> and as a whole the ValueTransformerSupplier interface.
> > >>> We can use ValueTransformer/ValueTransformerWithKey type instead
> > without
> > >>> additional supplier layer.
> > >>>
> > >>>
> > >>> Cheers,
> > >>> Jeyhun
> > >>>
> > >>>
> > >>> On Thu, May 25, 2017 at 1:07 AM Matthias J. Sax <
> matth...@confluent.io
> > >
> > >>> wrote:
> > >>>
> > >>>> One more question:
> > >>>>
> > >>>> Should we add any of
> > >>>>  - InitizialierWithKey
> > >>>>  - ReducerWithKey
> > >>>>  - ValueTransformerWithKey
> > >>>>
> > >>>> To get consistent/complete API, it might be a good idea. Any
> thoughts?
> > >>>>
> > >>>>
> > >>>> -Matthias
> > >>>>
> > >>>>
> > >>>> On 5/24/17 3:47 PM, Matthias J. Sax wrote:
> > >>>>> Jeyhun,
> > >>>>>
> > >>>>> I was just wondering if you did look into the key-deep-copy idea we
> > >>>>> discussed. I am curious to see what the impact might be.
> > >>>>>
> > >>>>>
> > >>>>> -Matthias
> > >>>>>
> > >>>>> On 5/20/17 2:03 AM, Jeyhun Karimov wrote:
> > >>>>>> Hi,
> > >>>>>>
> > >>>>>> Thanks for your comments. I rethink about including rich functions
> > >> into
> > >>>>>> this KIP.
> > >>>>>> I think once we include rich functions in this KIP and then fix
> > >>>>>> ProcessorContext in another KIP and incorporate with existing rich
> > >>>>>> functions, the code will not be backwards compatible.
> > >>>>>>
> > >>>>>> I see Damian's and your point more clearly now.
> > >>>>>>
> > >>>>>> Conclusion: we include only withKey interfaces in this KIP (I
> > updated
> > >>>> the
> > >>>>>> KIP), I will try also initiate another KIP for rich functions.
> > >>>>>>
> > >>>>>> Cheers,
> > >>>>>> Jeyhun
> > >>>>>>
> > >>>>>> On Fri, May 19, 2017 at 10:50 PM Matthias J. Sax <
> > >> matth...@confluent.io
> > >>>>>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> With the current KIP, using ValueMapper and ValueMapperWithKey
> > >>>>>>> interfaces, RichFunction seems to be an independent add-on. To
> fix
> > >> the
> > >>>>>>> original issue to allow key access, RichFunctions are not
> required
> > >>>> IMHO.
> > >>>>>>>
> > >>>>>>> I initially put the RichFunction idea on the table, because I was
> > >>>> hoping
> > >>>>>>> to get a uniform API. And I think, is was good to consider them
> --
> > >>>>>>> however, the discussion showed that they are not necessary for
> key
> > >>>>>>> access. Thus, it seems to be better to handle RichFunctions in an
> > own
> > >>>>>>> KIP. The ProcessorContext/RecordContext issues seems to be a main
> > >>>>>>> challenge for this. And introducing RichFunctions with
> > parameter-less
> > >>>>>>> init() method, seem not to help too much. We would get an
> > >>>> "intermediate"
> > >>>>>>> API that we want to change anyway later on...
> > >>>>>>>
> > >>>>>>> As you put already much effort into RichFunction, feel free do
> push
> > >>>> this
> > >>>>>>> further and start a new KIP (we can do this even in parallel) --
> we
> > >>>>>>> don't want to slow you down :) But it make the discussion and
> code
> > >>>>>>> review easier, if we separate both IMHO.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> -Matthias
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On 5/19/17 2:25 AM, Jeyhun Karimov wrote:
> > >>>>>>>> Hi Damian,
> > >>>>>>>>
> > >>>>>>>> Thanks for your comments. I think providing to users *interface*
> > >>>> rather
> > >>>>>>>> than *abstract class* should be preferred (Matthias also raised
> > this
> > >>>>>>> issue
> > >>>>>>>> ), anyway I changed the corresponding parts of KIP.
> > >>>>>>>>
> > >>>>>>>> Regarding with passing additional contextual information, I
> think
> > it
> > >>>> is a
> > >>>>>>>> tradeoff,
> > >>>>>>>> 1) first, we fix the context parameter for *init() *method in
> > >> another
> > >>>> PR
> > >>>>>>>> and solve Rich functions afterwards
> > >>>>>>>> 2) first, we fix the requested issues on jira ([1-3]) with
> > providing
> > >>>>>>>> (not-complete) Rich functions and integrate the context
> parameters
> > >> to
> > >>>>>>> this
> > >>>>>>>> afterwards (like improvement)
> > >>>>>>>>
> > >>>>>>>> To me, the second approach seems more incremental. However you
> are
> > >>>> right,
> > >>>>>>>> the names might confuse the users.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> [1] https://issues.apache.org/jira/browse/KAFKA-4218
> > >>>>>>>> [2] https://issues.apache.org/jira/browse/KAFKA-4726
> > >>>>>>>> [3] https://issues.apache.org/jira/browse/KAFKA-3745
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Cheers,
> > >>>>>>>> Jeyhun
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Fri, May 19, 2017 at 10:42 AM Damian Guy <
> damian....@gmail.com
> > >
> > >>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Hi,
> > >>>>>>>>>
> > >>>>>>>>> I see you've removed the `ProcessorContext` from the
> RichFunction
> > >>>> which
> > >>>>>>> is
> > >>>>>>>>> good, but why is it a `RichFunction`? I'd have expected it to
> > pass
> > >>>> some
> > >>>>>>>>> additional contextual information, i.e., the `RecordContext`
> that
> > >>>>>>> contains
> > >>>>>>>>> just the topic, partition, timestamp, offset.  I'm ok with it
> not
> > >>>>>>> passing
> > >>>>>>>>> this contextual information, but is the naming incorrect? I'm
> not
> > >>>> sure,
> > >>>>>>>>> tbh. I'm wondering if we should drop `RichFunctions` until we
> can
> > >> do
> > >>>> it
> > >>>>>>>>> properly with the correct context?
> > >>>>>>>>>
> > >>>>>>>>> Also, i don't like the abstract classes: RichValueMapper,
> > >>>>>>> RichValueJoiner,
> > >>>>>>>>> RichInitializer etc. Why can't they not just be interfaces?
> > >>>> Generally we
> > >>>>>>>>> should provide users with Intefaces to code against, not
> classes.
> > >>>>>>>>>
> > >>>>>>>>> Thanks,
> > >>>>>>>>> Damian
> > >>>>>>>>>
> > >>>>>>>>> On Fri, 19 May 2017 at 00:50 Jeyhun Karimov <
> > je.kari...@gmail.com>
> > >>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Hi,
> > >>>>>>>>>>
> > >>>>>>>>>> Thanks. I initiated the PR as well, to get a better overview.
> > >>>>>>>>>>
> > >>>>>>>>>> The only reason that I used abstract class instead of
> interface
> > >> for
> > >>>>>>> Rich
> > >>>>>>>>>> functions is that in future if we will have some
> > >>>> AbstractRichFunction
> > >>>>>>>>>> abstract classes,
> > >>>>>>>>>> we can easily extend:
> > >>>>>>>>>>
> > >>>>>>>>>> public abstract class RichValueMapper<K, V, VR>  implements
> > >>>>>>>>>> ValueMapperWithKey<K, V, VR>, RichFunction *extends
> > >>>>>>>>> AbstractRichFunction*{
> > >>>>>>>>>> }
> > >>>>>>>>>>  With interfaces we are only limited to interfaces for
> > >> inheritance.
> > >>>>>>>>>>
> > >>>>>>>>>> However, I think we can easily change it (from abstract class
> ->
> > >>>>>>>>> interface)
> > >>>>>>>>>> if you think interface is a better fit.
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> Cheers,
> > >>>>>>>>>> Jeyhun
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> On Fri, May 19, 2017 at 1:00 AM Matthias J. Sax <
> > >>>> matth...@confluent.io
> > >>>>>>>>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> Thanks for the update and explanations. The KIP is quite
> > improved
> > >>>> now
> > >>>>>>>>> --
> > >>>>>>>>>>> great job!
> > >>>>>>>>>>>
> > >>>>>>>>>>> One more question: Why are RichValueMapper etc abstract
> classes
> > >> and
> > >>>>>>> not
> > >>>>>>>>>>> interfaces?
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> Overall, I like the KIP a lot!
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> -Matthias
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> On 5/16/17 7:03 AM, Jeyhun Karimov wrote:
> > >>>>>>>>>>>> Hi,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Thanks for your comments.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I think supporting Lambdas for `withKey` and
> > >>>> `AbstractRichFunction`
> > >>>>>>>>>>>>> don't go together, as Lambdas are only supported for
> > interfaces
> > >>>>>>>>> AFAIK.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Maybe I misunderstood your comment.
> > >>>>>>>>>>>> *withKey* and and *withOnlyValue* are interfaces. So they
> > don't
> > >>>> have
> > >>>>>>>>>>> direct
> > >>>>>>>>>>>> relation with *AbstractRichFunction*.
> > >>>>>>>>>>>> *withKey* and and *withOnlyValue* interfaces have only one
> > >>>> method ,
> > >>>>>>>>> so
> > >>>>>>>>>>> we
> > >>>>>>>>>>>> can use lambdas.
> > >>>>>>>>>>>> Where does the *AbstractRichFunction* comes to play? Inside
> > Rich
> > >>>>>>>>>>> functions.
> > >>>>>>>>>>>> And we use Rich functions in 2 places:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> 1. User doesn't use rich functions. Just regular *withKey*
> and
> > >> and
> > >>>>>>>>>>>> *withOnlyValue* interfaces(both support lambdas) . We get
> > those
> > >>>>>>>>>>> interfaces
> > >>>>>>>>>>>> and wrap into Rich function while building the topology, and
> > >> send
> > >>>> to
> > >>>>>>>>>>>> Processor.
> > >>>>>>>>>>>> 2. User does use rich functions (Rich functions implement
> > >>>> *withKey*
> > >>>>>>>>>>>> interface). As a result no lamdas here by definition. In
> this
> > >>>> case,
> > >>>>>>>>>> while
> > >>>>>>>>>>>> building the topology we do a type check if the object type
> is
> > >>>>>>>>>>>> *withKey* or *RichFunction*.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> So *AbstractRichFunction* is just syntactic sugar for Rich
> > >>>> functions
> > >>>>>>>>>> and
> > >>>>>>>>>>>> does not affect using lambdas.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Thus, if we want to support Lambdas for `withKey`, we need
> to
> > >>>> have a
> > >>>>>>>>>>>>> interface approach like this
> > >>>>>>>>>>>>>   - RichFunction -> only adding init() and close()
> > >>>>>>>>>>>>>   - ValueMapper
> > >>>>>>>>>>>>>   - ValueMapperWithKey
> > >>>>>>>>>>>>>   - RichValueMapper extends ValueMapperWithKey,
> RichFunction
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> As I said above, currently we support lambdas for *withKey*
> > >>>>>>>>> interfaces
> > >>>>>>>>>> as
> > >>>>>>>>>>>> well.  However, I agree with your idea and I will remove the
> > >>>>>>>>>>>> AbstractRichFunction from the design.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> As an alternative, we could argue, that it is sufficient to
> > >>>> support
> > >>>>>>>>>>>>> Lambdas for the "plain" API only, but not for any "extended
> > >> API".
> > >>>>>>>>> For
> > >>>>>>>>>>>>> this, RichFunction could add key+init+close and
> > >>>> AbstractRichFunction
> > >>>>>>>>>>>>> would allow to only care about getting the key.
> > >>>>>>>>>>>>> Not sure, which one is better. I don't like the idea of
> more
> > >>>>>>>>>> overloaded
> > >>>>>>>>>>>>> methods to get Lambdas for `withKey` interfaces too much
> > >> because
> > >>>> we
> > >>>>>>>>>> have
> > >>>>>>>>>>>>> already so many overlaods. On the other hand, I do see
> value
> > in
> > >>>>>>>>>>>>> supporting Lambdas for `withKey`.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Just to clarify, with current design we have only one extra
> > >>>>>>>>> overloaded
> > >>>>>>>>>>>> method per *withOnlyValue* interface:  which is *withKey*
> > >> version
> > >>>> of
> > >>>>>>>>>>>> particular interface.
> > >>>>>>>>>>>> We don't need extra overload for Rich function as Rich
> > function
> > >>>>>>>>>>> implements
> > >>>>>>>>>>>> *withKey* interface as a result they have same type. We
> > >>>> differentiate
> > >>>>>>>>>>> them
> > >>>>>>>>>>>> while building the topology.
> > >>>>>>>>>>>> We supported lambdas for *withKey* APIs because of the
> > comment:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> @Jeyhun: I did not put any thought into this, but can we
> have
> > a
> > >>>>>>>>> design
> > >>>>>>>>>>>>> that allows for both? Also, with regard to lambdas, it
> might
> > >> make
> > >>>>>>>>>> sense
> > >>>>>>>>>>>>> to allow for both `V -> newV` and `(K, V) -> newV` ?
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> However, I don't think that this complicates the overall
> > design
> > >>>>>>>>>>>> significantly.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Depending on what we want to support, it might make sense to
> > >>>>>>>>>>>>> include/exclude RichFunctions from this KIP -- and thus,
> this
> > >>>> also
> > >>>>>>>>>>>>> determines if we should have a "ProcessorContext KIP"
> before
> > >>>> driving
> > >>>>>>>>>>>>> this KIP further.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Based on our discussion I think we should keep Rich
> functions
> > >> as I
> > >>>>>>>>>> don't
> > >>>>>>>>>>>> think that they bring extra layer of overhead to library.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Any comments are appreciated.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>> Jeyhun
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Tue, May 16, 2017 at 12:10 AM Matthias J. Sax <
> > >>>>>>>>>> matth...@confluent.io>
> > >>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> Jeyhun,
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> thanks for the update.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I think supporting Lambdas for `withKey` and
> > >>>> `AbstractRichFunction`
> > >>>>>>>>>>>>> don't go together, as Lambdas are only supported for
> > interfaces
> > >>>>>>>>> AFAIK.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Thus, if we want to support Lambdas for `withKey`, we need
> to
> > >>>> have a
> > >>>>>>>>>>>>> interface approach like this
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>   - RichFunction -> only adding init() and close()
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>   - ValueMapper
> > >>>>>>>>>>>>>   - ValueMapperWithKey
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>   - RichValueMapper extends ValueMapperWithKey,
> RichFunction
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> For this approach, AbstractRichFunction does not make sense
> > >>>> anymore,
> > >>>>>>>>>> as
> > >>>>>>>>>>>>> the only purpose of `RichFunction` is to allow the
> > >>>> implementation of
> > >>>>>>>>>>>>> init() and close() -- if you don't want those, you would
> > >>>> implement a
> > >>>>>>>>>>>>> different interface (ie, ValueMapperWithKey)
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> As an alternative, we could argue, that it is sufficient to
> > >>>> support
> > >>>>>>>>>>>>> Lambdas for the "plain" API only, but not for any "extended
> > >> API".
> > >>>>>>>>> For
> > >>>>>>>>>>>>> this, RichFunction could add key+init+close and
> > >>>> AbstractRichFunction
> > >>>>>>>>>>>>> would allow to only care about getting the key.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Not sure, which one is better. I don't like the idea of
> more
> > >>>>>>>>>> overloaded
> > >>>>>>>>>>>>> methods to get Lambdas for `withKey` interfaces too much
> > >> because
> > >>>> we
> > >>>>>>>>>> have
> > >>>>>>>>>>>>> already so many overlaods. On the other hand, I do see
> value
> > in
> > >>>>>>>>>>>>> supporting Lambdas for `withKey`.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Depending on what we want to support, it might make sense
> to
> > >>>>>>>>>>>>> include/exclude RichFunctions from this KIP -- and thus,
> this
> > >>>> also
> > >>>>>>>>>>>>> determines if we should have a "ProcessorContext KIP"
> before
> > >>>> driving
> > >>>>>>>>>>>>> this KIP further.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Thoughts?
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> -Matthias
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On 5/15/17 11:01 AM, Jeyhun Karimov wrote:
> > >>>>>>>>>>>>>> Hi,
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Sorry for super late response. Thanks for your comments.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I am not an expert on Lambdas. Can you elaborate a little
> > >> bit? I
> > >>>>>>>>>> cannot
> > >>>>>>>>>>>>>>> follow the explanation in the KIP to see what the problem
> > is.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> - From [1] says "A functional interface is an interface
> that
> > >> has
> > >>>>>>>>> just
> > >>>>>>>>>>> one
> > >>>>>>>>>>>>>> abstract method, and thus represents a single function
> > >>>> contract".
> > >>>>>>>>>>>>>> So basically once we extend some interface from another
> (in
> > >> our
> > >>>>>>>>> case,
> > >>>>>>>>>>>>>> ValueMapperWithKey from ValueMapper) we cannot use lambdas
> > in
> > >>>> the
> > >>>>>>>>>>>>> extended
> > >>>>>>>>>>>>>> interface.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Further comments:
> > >>>>>>>>>>>>>>>  - The KIP get a little hard to read -- can you maybe
> > >> reformat
> > >>>> the
> > >>>>>>>>>>> wiki
> > >>>>>>>>>>>>>>> page a little bit? I think using `CodeBlock` would help.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> - I will work on the KIP.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>  - What about KStream-KTable joins? You don't have
> overlaods
> > >>>> added
> > >>>>>>>>>> for
> > >>>>>>>>>>>>>>> them. Why? (Even if I still hope that we don't need to
> add
> > >> any
> > >>>> new
> > >>>>>>>>>>>>>>> overloads)
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> - Actually there are more than one Processor and public
> APIs
> > >> to
> > >>>> be
> > >>>>>>>>>>>>>> changed (KStream-KTable
> > >>>>>>>>>>>>>> joins is one case). However all of them has similar
> > structure:
> > >>>> we
> > >>>>>>>>>>>>> overload
> > >>>>>>>>>>>>>> the *method* with  *methodWithKey*,
> > >>>>>>>>>>>>>> wrap it into the Rich function, send to processor and
> inside
> > >> the
> > >>>>>>>>>>>>> processor
> > >>>>>>>>>>>>>> call *init* and *close* methods of the Rich function.
> > >>>>>>>>>>>>>> As I wrote in KIP, I wanted to demonstrate the overall
> idea
> > >> with
> > >>>>>>>>> only
> > >>>>>>>>>>>>>> *ValueMapper* as the same can be applied to all changes.
> > >>>>>>>>>>>>>> Anyway I will update the KIP.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>  - Why do we need `AbstractRichFunction`?
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Instead of overriding the *init(ProcessorContext p)* and*
> > >>>> close()*
> > >>>>>>>>>>>>> methods
> > >>>>>>>>>>>>>> in every Rich function with empty body like:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> @Override
> > >>>>>>>>>>>>>> void init(ProcessorContext context) {}
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> @Override
> > >>>>>>>>>>>>>> void close () {}
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I thought that we can override them once in
> > >>>> *AbstractRichFunction*
> > >>>>>>>>>> and
> > >>>>>>>>>>>>>> extent new Rich functions from *AbstractRichFunction*.
> > >>>>>>>>>>>>>> Basically this can eliminate code copy-paste and ease the
> > >>>>>>>>>> maintenance.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>  - What about interfaces Initializer, ForeachAction,
> Merger,
> > >>>>>>>>>> Predicate,
> > >>>>>>>>>>>>>>> Reducer? I don't want to say we should/need to add to
> all,
> > >> but
> > >>>> we
> > >>>>>>>>>>> should
> > >>>>>>>>>>>>>>> discuss all of them and add where it does make sense
> (e.g.,
> > >>>>>>>>>>>>>>> RichForachAction does make sense IMHO)
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Definitely agree. As I said, the same technique applies to
> > all
> > >>>> this
> > >>>>>>>>>>>>>> interfaces and I didn't want to explode the KIP, just
> wanted
> > >> to
> > >>>>>>>>> give
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>>> overall intuition.
> > >>>>>>>>>>>>>> However, I will update the KIP as I said.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Btw: I like the hierarchy `ValueXX` -- `ValueXXWithKey` --
> > >>>>>>>>>>> `RichValueXX`
> > >>>>>>>>>>>>>>> in general -- but why can't we do all this with
> interfaces
> > >>>> only?
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Sure we can. However the main intuition is we should not
> > force
> > >>>>>>>>> users
> > >>>>>>>>>> to
> > >>>>>>>>>>>>>> implement *init(ProcessorContext)* and *close()* functions
> > >> every
> > >>>>>>>>> time
> > >>>>>>>>>>>>> they
> > >>>>>>>>>>>>>> use Rich functions.
> > >>>>>>>>>>>>>> If one needs, she can override the respective methods.
> > >> However,
> > >>>> I
> > >>>>>>>>> am
> > >>>>>>>>>>> open
> > >>>>>>>>>>>>>> for discussion.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I'd rather not see the use of  `ProcessorContext` spread
> any
> > >>>>>>>>> further
> > >>>>>>>>>>> than
> > >>>>>>>>>>>>>>> it currently is. So maybe we need another KIP that is
> done
> > >>>> before
> > >>>>>>>>>>> this?
> > >>>>>>>>>>>>>>> Otherwise i think the scope of this KIP is becoming too
> > >> large.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> That is good point. I wanted to make
> > *init(ProcessorContext)*
> > >>>>>>>>> method
> > >>>>>>>>>>>>>> persistent among the library (which use ProcessorContext
> as
> > an
> > >>>>>>>>>> input),
> > >>>>>>>>>>>>>> therefore I put *ProcessorContext* as an input.
> > >>>>>>>>>>>>>> So the important question is that (as @dguy and @mjsax
> > >>>> mentioned)
> > >>>>>>>>>>> whether
> > >>>>>>>>>>>>>> continue this KIP without providing users an access to
> > >>>>>>>>>>> *ProcessorContext*
> > >>>>>>>>>>>>>> (change *init (ProcessorContext)* to * init()* ) or
> > >>>>>>>>>>>>>> initiate another KIP before this.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> [1]
> > >>>>>>>>>>>>>>
> > http://cr.openjdk.java.net/~mr/se/8/java-se-8-pfd-spec/java-
> > >>>>>>>>>>>>> se-8-jls-pfd-diffs.pdf
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>>>> Jeyhun
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On Mon, May 15, 2017 at 7:15 PM, Damian Guy <
> > >>>> damian....@gmail.com>
> > >>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> I'd rather not see the use of  `ProcessorContext` spread
> > any
> > >>>>>>>>> further
> > >>>>>>>>>>>>> than
> > >>>>>>>>>>>>>>> it currently is. So maybe we need another KIP that is
> done
> > >>>> before
> > >>>>>>>>>>> this?
> > >>>>>>>>>>>>>>> Otherwise i think the scope of this KIP is becoming too
> > >> large.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> On Mon, 15 May 2017 at 18:06 Matthias J. Sax <
> > >>>>>>>>> matth...@confluent.io
> > >>>>>>>>>>>
> > >>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> I agree that that `ProcessorContext` interface is too
> > broad
> > >> in
> > >>>>>>>>>>> general
> > >>>>>>>>>>>>>>>> -- this is even true for transform/process, and it's
> also
> > >>>>>>>>> reflected
> > >>>>>>>>>>> in
> > >>>>>>>>>>>>>>>> the API improvement list we want to do.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/
> > >>>>>>>>>>>>>>> Kafka+Streams+Discussions
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> So I am wondering, if you question the `RichFunction`
> > >>>> approach in
> > >>>>>>>>>>>>>>>> general? Or if you suggest to either extend the scope of
> > >> this
> > >>>> KIP
> > >>>>>>>>>> to
> > >>>>>>>>>>>>>>>> include this---or maybe better, do another KIP for it
> and
> > >>>> delay
> > >>>>>>>>>> this
> > >>>>>>>>>>>>> KIP
> > >>>>>>>>>>>>>>>> until the other one is done?
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> -Matthias
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> On 5/15/17 2:35 AM, Damian Guy wrote:
> > >>>>>>>>>>>>>>>>> Thanks for the KIP.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> I'm not convinced on the `RichFunction` approach. Do we
> > >>>> really
> > >>>>>>>>>> want
> > >>>>>>>>>>> to
> > >>>>>>>>>>>>>>>> give
> > >>>>>>>>>>>>>>>>> every DSL method access to the `ProcessorContext` ? It
> > has
> > >> a
> > >>>>>>>>> bunch
> > >>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>> methods on it that seem in-appropriate for some of the
> > DSL
> > >>>>>>>>>> methods,
> > >>>>>>>>>>>>>>> i.e,
> > >>>>>>>>>>>>>>>>> `register`, `getStateStore`, `forward`, `schedule` etc.
> > It
> > >> is
> > >>>>>>>>> far
> > >>>>>>>>>>> too
> > >>>>>>>>>>>>>>>>> broad. I think it would be better to have a narrower
> > >>>> interface
> > >>>>>>>>>> like
> > >>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>> `RecordContext`  - remembering it is easier to add
> > >>>>>>>>>>> methods/interfaces
> > >>>>>>>>>>>>>>>> later
> > >>>>>>>>>>>>>>>>> than to remove them
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> On Sat, 13 May 2017 at 22:26 Matthias J. Sax <
> > >>>>>>>>>> matth...@confluent.io
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Jeyhun,
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> I am not an expert on Lambdas. Can you elaborate a
> > little
> > >>>> bit?
> > >>>>>>>>> I
> > >>>>>>>>>>>>>>> cannot
> > >>>>>>>>>>>>>>>>>> follow the explanation in the KIP to see what the
> > problem
> > >>>> is.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> For updating the KIP title I don't know -- guess it's
> up
> > >> to
> > >>>>>>>>> you.
> > >>>>>>>>>>>>>>> Maybe a
> > >>>>>>>>>>>>>>>>>> committer can comment on this?
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Further comments:
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>  - The KIP get a little hard to read -- can you maybe
> > >>>> reformat
> > >>>>>>>>>> the
> > >>>>>>>>>>>>>>> wiki
> > >>>>>>>>>>>>>>>>>> page a little bit? I think using `CodeBlock` would
> help.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>  - What about KStream-KTable joins? You don't have
> > >> overlaods
> > >>>>>>>>>> added
> > >>>>>>>>>>>>> for
> > >>>>>>>>>>>>>>>>>> them. Why? (Even if I still hope that we don't need to
> > add
> > >>>> any
> > >>>>>>>>>> new
> > >>>>>>>>>>>>>>>>>> overloads)
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>  - Why do we need `AbstractRichFunction`?
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>  - What about interfaces Initializer, ForeachAction,
> > >> Merger,
> > >>>>>>>>>>>>>>> Predicate,
> > >>>>>>>>>>>>>>>>>> Reducer? I don't want to say we should/need to add to
> > all,
> > >>>> but
> > >>>>>>>>> we
> > >>>>>>>>>>>>>>> should
> > >>>>>>>>>>>>>>>>>> discuss all of them and add where it does make sense
> > >> (e.g.,
> > >>>>>>>>>>>>>>>>>> RichForachAction does make sense IMHO)
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Btw: I like the hierarchy `ValueXX` --
> `ValueXXWithKey`
> > --
> > >>>>>>>>>>>>>>> `RichValueXX`
> > >>>>>>>>>>>>>>>>>> in general -- but why can't we do all this with
> > interfaces
> > >>>>>>>>> only?
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> -Matthias
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> On 5/11/17 7:00 AM, Jeyhun Karimov wrote:
> > >>>>>>>>>>>>>>>>>>> Hi,
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Thanks for your comments. I think we cannot extend
> the
> > >> two
> > >>>>>>>>>>>>> interfaces
> > >>>>>>>>>>>>>>>> if
> > >>>>>>>>>>>>>>>>>> we
> > >>>>>>>>>>>>>>>>>>> want to keep lambdas. I updated the KIP [1]. Maybe I
> > >> should
> > >>>>>>>>>> change
> > >>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>> title, because now we are not limiting the KIP to
> only
> > >>>>>>>>>>> ValueMapper,
> > >>>>>>>>>>>>>>>>>>> ValueTransformer and ValueJoiner.
> > >>>>>>>>>>>>>>>>>>> Please feel free to comment.
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> [1]
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >>>>>>>>>>>>>>> 149%3A+Enabling+key+access+in+ValueTransformer%2C+
> > >>>>>>>>>>>>>>> ValueMapper%2C+and+ValueJoiner
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>>>>>>>>> Jeyhun
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> On Tue, May 9, 2017 at 7:36 PM Matthias J. Sax <
> > >>>>>>>>>>>>>>> matth...@confluent.io>
> > >>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> If `ValueMapperWithKey` extends `ValueMapper` we
> don't
> > >>>> need
> > >>>>>>>>> the
> > >>>>>>>>>>> new
> > >>>>>>>>>>>>>>>>>>>> overlaod.
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> And yes, we need to do one check -- but this happens
> > >> when
> > >>>>>>>>>>> building
> > >>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>> topology. At runtime (I mean after
> KafkaStream#start()
> > >> we
> > >>>>>>>>> don't
> > >>>>>>>>>>>>> need
> > >>>>>>>>>>>>>>>> any
> > >>>>>>>>>>>>>>>>>>>> check as we will always use `ValueMapperWithKey`)
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> -Matthias
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> On 5/9/17 2:55 AM, Jeyhun Karimov wrote:
> > >>>>>>>>>>>>>>>>>>>>> Hi,
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Thanks for feedback.
> > >>>>>>>>>>>>>>>>>>>>> Then we need to overload method
> > >>>>>>>>>>>>>>>>>>>>>   <VR> KStream<K, VR> mapValues(ValueMapper<? super
> > V,
> > >> ?
> > >>>>>>>>>> extends
> > >>>>>>>>>>>>>>> VR>
> > >>>>>>>>>>>>>>>>>>>>> mapper);
> > >>>>>>>>>>>>>>>>>>>>> with
> > >>>>>>>>>>>>>>>>>>>>>   <VR> KStream<K, VR>
> mapValues(ValueMapperWithKey<?
> > >>>> super
> > >>>>>>>>> V,
> > >>>>>>>>>> ?
> > >>>>>>>>>>>>>>>> extends
> > >>>>>>>>>>>>>>>>>>>> VR>
> > >>>>>>>>>>>>>>>>>>>>> mapper);
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> and in runtime (inside processor) we still have to
> > >> check
> > >>>> it
> > >>>>>>>>> is
> > >>>>>>>>>>>>>>>>>>>> ValueMapper
> > >>>>>>>>>>>>>>>>>>>>> or ValueMapperWithKey before wrapping it into the
> > rich
> > >>>>>>>>>> function.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Please correct me if I am wrong.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>>>>>>>>>>> Jeyhun
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> On Tue, May 9, 2017 at 10:56 AM Michal Borowiecki <
> > >>>>>>>>>>>>>>>>>>>>> michal.borowie...@openbet.com> wrote:
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> +1 :)
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> On 08/05/17 23:52, Matthias J. Sax wrote:
> > >>>>>>>>>>>>>>>>>>>>>>> Hi,
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> I was reading the updated KIP and I am wondering,
> > if
> > >> we
> > >>>>>>>>>> should
> > >>>>>>>>>>>>> do
> > >>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>> design a little different.
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Instead of distinguishing between a RichFunction
> > and
> > >>>>>>>>>>>>>>>> non-RichFunction
> > >>>>>>>>>>>>>>>>>>>> at
> > >>>>>>>>>>>>>>>>>>>>>>> runtime level, we would use RichFunctions all the
> > >> time.
> > >>>>>>>>>> Thus,
> > >>>>>>>>>>> on
> > >>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>> DSL
> > >>>>>>>>>>>>>>>>>>>>>>> entry level, if a user provides a
> non-RichFunction,
> > >> we
> > >>>>>>>>> wrap
> > >>>>>>>>>> it
> > >>>>>>>>>>>>>>> by a
> > >>>>>>>>>>>>>>>>>>>>>>> RichFunction that is fully implemented by
> Streams.
> > >> This
> > >>>>>>>>>>>>>>>> RichFunction
> > >>>>>>>>>>>>>>>>>>>>>>> would just forward the call omitting the key:
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Just to sketch the idea (incomplete code
> snippet):
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> public StreamsRichValueMapper implements
> > >>>>>>>>> RichValueMapper()
> > >>>>>>>>>> {
> > >>>>>>>>>>>>>>>>>>>>>>>>    private ValueMapper userProvidedMapper; //
> set
> > by
> > >>>>>>>>>>>>> constructor
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>    public VR apply(final K key, final V1 value1,
> > >>>> final V2
> > >>>>>>>>>>>>>>> value2)
> > >>>>>>>>>>>>>>>> {
> > >>>>>>>>>>>>>>>>>>>>>>>>        return userProvidedMapper(value1,
> value2);
> > >>>>>>>>>>>>>>>>>>>>>>>>    }
> > >>>>>>>>>>>>>>>>>>>>>>>> }
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>  From a performance point of view, I am not sure
> if
> > >> the
> > >>>>>>>>>>>>>>>>>>>>>>> "if(isRichFunction)" including casts etc would
> have
> > >>>> more
> > >>>>>>>>>>>>> overhead
> > >>>>>>>>>>>>>>>>>> than
> > >>>>>>>>>>>>>>>>>>>>>>> this approach (we would do more nested method
> call
> > >> for
> > >>>>>>>>>>>>>>>>>> non-RichFunction
> > >>>>>>>>>>>>>>>>>>>>>>> which should be more common than RichFunctions).
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> This approach should not effect lambdas (or do I
> > miss
> > >>>>>>>>>>>>> something?)
> > >>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>>>>> might be cleaner, as we could have one more top
> > level
> > >>>>>>>>>>> interface
> > >>>>>>>>>>>>>>>>>>>>>>> `RichFunction` with methods `init()` and
> `close()`
> > >> and
> > >>>>>>>>> also
> > >>>>>>>>>>>>>>>>>> interfaces
> > >>>>>>>>>>>>>>>>>>>>>>> for `RichValueMapper` etc. (thus, no abstract
> > classes
> > >>>>>>>>>>> required).
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Any thoughts?
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> -Matthias
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> On 5/6/17 5:29 PM, Jeyhun Karimov wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>> Hi,
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> Thanks for comments. I extended PR and KIP to
> > >> include
> > >>>>>>>>> rich
> > >>>>>>>>>>>>>>>>>> functions.
> > >>>>>>>>>>>>>>>>>>>> I
> > >>>>>>>>>>>>>>>>>>>>>>>> will still have to evaluate the cost of deep
> > copying
> > >>>> of
> > >>>>>>>>>> keys.
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>>>>>>>>>>>>>> Jeyhun
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> On Fri, May 5, 2017 at 8:02 PM Mathieu Fenniak <
> > >>>>>>>>>>>>>>>>>>>>>> mathieu.fenn...@replicon.com>
> > >>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> Hey Matthias,
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> My opinion would be that documenting the
> > >>>> immutability of
> > >>>>>>>>>> the
> > >>>>>>>>>>>>>>> key
> > >>>>>>>>>>>>>>>> is
> > >>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>>> best approach available.  Better than requiring
> > the
> > >>>> key
> > >>>>>>>>> to
> > >>>>>>>>>>> be
> > >>>>>>>>>>>>>>>>>>>>>> serializable
> > >>>>>>>>>>>>>>>>>>>>>>>>> (as with Jeyhun's last pass at the PR), no
> > >>>> performance
> > >>>>>>>>>> risk.
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> It'd be different if Java had immutable type
> > >>>> constraints
> > >>>>>>>>>> of
> > >>>>>>>>>>>>>>> some
> > >>>>>>>>>>>>>>>>>>>> kind.
> > >>>>>>>>>>>>>>>>>>>>>> :-)
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> Mathieu
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> On Fri, May 5, 2017 at 11:31 AM, Matthias J.
> Sax
> > <
> > >>>>>>>>>>>>>>>>>>>>>> matth...@confluent.io>
> > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>> Agreed about RichFunction. If we follow this
> > path,
> > >>>> it
> > >>>>>>>>>>> should
> > >>>>>>>>>>>>>>>> cover
> > >>>>>>>>>>>>>>>>>>>>>>>>>> all(?) DSL interfaces.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>> About guarding the key -- I am still not sure
> > what
> > >>>> to
> > >>>>>>>>> do
> > >>>>>>>>>>>>> about
> > >>>>>>>>>>>>>>>>>> it...
> > >>>>>>>>>>>>>>>>>>>>>>>>>> Maybe it might be enough to document it (and
> > name
> > >>>> the
> > >>>>>>>>> key
> > >>>>>>>>>>>>>>>>>> parameter
> > >>>>>>>>>>>>>>>>>>>>>> like
> > >>>>>>>>>>>>>>>>>>>>>>>>>> `readOnlyKey` to make is very clear).
> > Ultimately,
> > >> I
> > >>>>>>>>> would
> > >>>>>>>>>>>>>>> prefer
> > >>>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>>>>>>> guard against any modification, but I have no
> > good
> > >>>> idea
> > >>>>>>>>>> how
> > >>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>> do
> > >>>>>>>>>>>>>>>>>>>>>> this.
> > >>>>>>>>>>>>>>>>>>>>>>>>>> Not sure what others think about the risk of
> > >>>> corrupted
> > >>>>>>>>>>>>>>>>>> partitioning
> > >>>>>>>>>>>>>>>>>>>>>>>>>> (what would be a user error and we could say,
> > >> well,
> > >>>> bad
> > >>>>>>>>>>> luck,
> > >>>>>>>>>>>>>>>> you
> > >>>>>>>>>>>>>>>>>>>> got
> > >>>>>>>>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>>>>>>>>> bug in your code, that's not our fault), vs
> deep
> > >>>> copy
> > >>>>>>>>>> with
> > >>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>>> potential
> > >>>>>>>>>>>>>>>>>>>>>>>>>> performance hit (that we can't quantity atm
> > >> without
> > >>>> any
> > >>>>>>>>>>>>>>>>>> performance
> > >>>>>>>>>>>>>>>>>>>>>>>>> test).
> > >>>>>>>>>>>>>>>>>>>>>>>>>> We do have a performance system test. Maybe
> it's
> > >>>> worth
> > >>>>>>>>>> for
> > >>>>>>>>>>>>> you
> > >>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>>> apply
> > >>>>>>>>>>>>>>>>>>>>>>>>>> the deep copy strategy and run the test. It's
> > very
> > >>>>>>>>> basic
> > >>>>>>>>>>>>>>>>>> performance
> > >>>>>>>>>>>>>>>>>>>>>>>>>> test only, but might give some insight. If you
> > >> want
> > >>>> to
> > >>>>>>>>> do
> > >>>>>>>>>>>>>>> this,
> > >>>>>>>>>>>>>>>>>> look
> > >>>>>>>>>>>>>>>>>>>>>>>>>> into folder "tests" for general test setup,
> and
> > >> into
> > >>>>>>>>>>>>>>>>>>>>>>>>>> "tests/kafaktests/benchmarks/streams" to find
> > find
> > >>>> the
> > >>>>>>>>>> perf
> > >>>>>>>>>>>>>>>> test.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>> -Matthias
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>> On 5/5/17 8:55 AM, Jeyhun Karimov wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Matthias,
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> I think extending KIP to include
> RichFunctions
> > >>>> totally
> > >>>>>>>>>>>>> makes
> > >>>>>>>>>>>>>>>>>>>> sense.
> > >>>>>>>>>>>>>>>>>>>>>>>>> So,
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>   we don't want to guard the keys because it
> is
> > >>>>>>>>> costly.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> If we introduce RichFunctions I think it
> should
> > >>>> not be
> > >>>>>>>>>>>>>>> limited
> > >>>>>>>>>>>>>>>>>> only
> > >>>>>>>>>>>>>>>>>>>>>>>>> the 3
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> interfaces proposed in KIP but to wide range
> of
> > >>>>>>>>>>> interfaces.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Please correct me if I am wrong.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Jeyhun
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, May 5, 2017 at 12:04 AM Matthias J.
> > Sax <
> > >>>>>>>>>>>>>>>>>>>>>> matth...@confluent.io
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> One follow up. There was this email on the
> > user
> > >>>> list:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>> http://search-hadoop.com/m/Kafka/uyzND17KhCaBzPSZ1?subj=
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>> Shouldn+t+the+initializer+of+a+stream+aggregate+accept+the+
> > >>>>>>>>>>>>>>> key+
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> It might make sense so include Initializer,
> > >> Adder,
> > >>>>>>>>> and
> > >>>>>>>>>>>>>>>>>> Substractor
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> inferface, too.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> And we should double check if there are
> other
> > >>>>>>>>> interface
> > >>>>>>>>>>> we
> > >>>>>>>>>>>>>>>> might
> > >>>>>>>>>>>>>>>>>>>>>> miss
> > >>>>>>>>>>>>>>>>>>>>>>>>>> atm.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> -Matthias
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> On 5/4/17 1:31 PM, Matthias J. Sax wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for updating the KIP.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Deep copying the key will work for sure,
> but
> > I
> > >> am
> > >>>>>>>>>>> actually
> > >>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>>> little
> > >>>>>>>>>>>>>>>>>>>>>>>>> bit
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> worried about performance impact... We
> might
> > >>>> want to
> > >>>>>>>>>> do
> > >>>>>>>>>>>>>>> some
> > >>>>>>>>>>>>>>>>>> test
> > >>>>>>>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> quantify this impact.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Btw: this remind me about the idea of
> > >>>> `RichFunction`
> > >>>>>>>>>>>>>>>> interface
> > >>>>>>>>>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> would allow users to access record metadata
> > >> (like
> > >>>>>>>>>>>>>>> timestamp,
> > >>>>>>>>>>>>>>>>>>>>>> offset,
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> partition etc) within DSL. This would be a
> > >>>> similar
> > >>>>>>>>>>>>> concept.
> > >>>>>>>>>>>>>>>>>>>> Thus, I
> > >>>>>>>>>>>>>>>>>>>>>>>>> am
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wondering, if it would make sense to
> enlarge
> > >> the
> > >>>>>>>>> scope
> > >>>>>>>>>>> of
> > >>>>>>>>>>>>>>>> this
> > >>>>>>>>>>>>>>>>>>>> KIP
> > >>>>>>>>>>>>>>>>>>>>>> by
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> that? WDYT?
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> -Matthias
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 5/3/17 2:08 AM, Jeyhun Karimov wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Mathieu,
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for feedback. I followed similar
> > >> approach
> > >>>>>>>>> and
> > >>>>>>>>>>>>>>> updated
> > >>>>>>>>>>>>>>>>>> PR
> > >>>>>>>>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>>>>>>>> KIP
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> accordingly. I tried to guard the key in
> > >>>> Processors
> > >>>>>>>>>>>>>>> sending
> > >>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>>> copy
> > >>>>>>>>>>>>>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>>>>>>>>>>> an
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> actual key.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Because I am doing deep copy of an
> object, I
> > >>>> think
> > >>>>>>>>>>> memory
> > >>>>>>>>>>>>>>>> can
> > >>>>>>>>>>>>>>>>>> be
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> bottleneck
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in some use-cases.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeyhun
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, May 2, 2017 at 5:10 PM Mathieu
> > >> Fenniak <
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> mathieu.fenn...@replicon.com>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jeyhun,
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This approach would change ValueMapper
> > >>>> (...etc) to
> > >>>>>>>>>> be
> > >>>>>>>>>>>>>>>>>> classes,
> > >>>>>>>>>>>>>>>>>>>>>>>>> rather
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> than
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interfaces, which is also a backwards
> > >>>> incompatible
> > >>>>>>>>>>>>>>> change.
> > >>>>>>>>>>>>>>>>>> An
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> alternative
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> approach that would be backwards
> compatible
> > >>>> would
> > >>>>>>>>> be
> > >>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>> define
> > >>>>>>>>>>>>>>>>>>>>>> new
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interfaces, and provide overrides where
> > those
> > >>>>>>>>>>> interfaces
> > >>>>>>>>>>>>>>>> are
> > >>>>>>>>>>>>>>>>>>>>>> used.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Unfortunately, making the key parameter
> as
> > >>>> "final"
> > >>>>>>>>>>>>>>> doesn't
> > >>>>>>>>>>>>>>>>>>>> change
> > >>>>>>>>>>>>>>>>>>>>>>>>>> much
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> about guarding against key change.  It
> only
> > >>>>>>>>> prevents
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>> parameter
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> variable
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from being reassigned.  If the key type
> is
> > a
> > >>>>>>>>> mutable
> > >>>>>>>>>>>>>>> object
> > >>>>>>>>>>>>>>>>>>>> (eg.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> byte[]),
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it can still be mutated. (eg. key[0] =
> 0).
> > >> But
> > >>>>>>>>> I'm
> > >>>>>>>>>>> not
> > >>>>>>>>>>>>>>>>>> really
> > >>>>>>>>>>>>>>>>>>>>>> sure
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> there's
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> much that can be done about that.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Mathieu
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, May 1, 2017 at 5:39 PM, Jeyhun
> > >> Karimov
> > >>>> <
> > >>>>>>>>>>>>>>>>>>>>>>>>> je.kari...@gmail.com
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for comments.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The concerns makes sense. Although we
> can
> > >>>> guard
> > >>>>>>>>> for
> > >>>>>>>>>>>>>>>>>> immutable
> > >>>>>>>>>>>>>>>>>>>>>> keys
> > >>>>>>>>>>>>>>>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> current implementation (with few
> > changes), I
> > >>>>>>>>> didn't
> > >>>>>>>>>>>>>>>> consider
> > >>>>>>>>>>>>>>>>>>>>>>>>>> backward
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compatibility.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In this case 2 solutions come to my
> mind.
> > In
> > >>>> both
> > >>>>>>>>>>>>> cases,
> > >>>>>>>>>>>>>>>>>> user
> > >>>>>>>>>>>>>>>>>>>>>>>>>> accesses
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> key in Object type, as passing extra
> type
> > >>>>>>>>> parameter
> > >>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>>>> break
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> backwards-compatibility.  So user has
> to
> > >> cast
> > >>>> to
> > >>>>>>>>>>> actual
> > >>>>>>>>>>>>>>>> key
> > >>>>>>>>>>>>>>>>>>>>>> type.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>
>
> --
> -Cheers
>
> Jeyhun
>



-- 
-- Guozhang

Re: [DISCUSS]: KIP-149: Enabling key access in ValueTransformer, ValueMapper, and ValueJoiner

Reply via email to