Re: [DISCUSS]: KIP-149: Enabling key access in ValueTransformer, ValueMapper, and ValueJoiner

Jeyhun Karimov Thu, 18 May 2017 16:50:32 -0700

Hi,

Thanks. I initiated the PR as well, to get a better overview.


The only reason that I used abstract class instead of interface for Rich
functions is that in future if we will have some AbstractRichFunction
abstract classes,
we can easily extend:

public abstract class RichValueMapper<K, V, VR>  implements
ValueMapperWithKey<K, V, VR>, RichFunction *extends  AbstractRichFunction*{
}
 With interfaces we are only limited to interfaces for inheritance.

However, I think we can easily change it (from abstract class -> interface)
if you think interface is a better fit.


Cheers,
Jeyhun


On Fri, May 19, 2017 at 1:00 AM Matthias J. Sax <[email protected]>
wrote:

> Thanks for the update and explanations. The KIP is quite improved now --
> great job!
>
> One more question: Why are RichValueMapper etc abstract classes and not
> interfaces?
>
>
> Overall, I like the KIP a lot!
>
>
> -Matthias
>
>
> On 5/16/17 7:03 AM, Jeyhun Karimov wrote:
> > Hi,
> >
> > Thanks for your comments.
> >
> > I think supporting Lambdas for `withKey` and `AbstractRichFunction`
> >> don't go together, as Lambdas are only supported for interfaces AFAIK.
> >
> >
> > Maybe I misunderstood your comment.
> > *withKey* and and *withOnlyValue* are interfaces. So they don't have
> direct
> > relation with *AbstractRichFunction*.
> > *withKey* and and *withOnlyValue* interfaces have only one  method , so
> we
> > can use lambdas.
> > Where does the *AbstractRichFunction* comes to play? Inside Rich
> functions.
> > And we use Rich functions in 2 places:
> >
> > 1. User doesn't use rich functions. Just regular *withKey* and and
> > *withOnlyValue* interfaces(both support lambdas) . We get those
> interfaces
> > and wrap into Rich function while building the topology, and send to
> > Processor.
> > 2. User does use rich functions (Rich functions implement *withKey*
> > interface). As a result no lamdas here by definition. In this case, while
> > building the topology we do a type check if the object type is
> > *withKey* or *RichFunction*.
> >
> > So *AbstractRichFunction* is just syntactic sugar for Rich functions and
> > does not affect using lambdas.
> >
> > Thus, if we want to support Lambdas for `withKey`, we need to have a
> >> interface approach like this
> >>   - RichFunction -> only adding init() and close()
> >>   - ValueMapper
> >>   - ValueMapperWithKey
> >>   - RichValueMapper extends ValueMapperWithKey, RichFunction
> >
> >
> > As I said above, currently we support lambdas for *withKey* interfaces as
> > well.  However, I agree with your idea and I will remove the
> > AbstractRichFunction from the design.
> >
> > As an alternative, we could argue, that it is sufficient to support
> >> Lambdas for the "plain" API only, but not for any "extended API". For
> >> this, RichFunction could add key+init+close and AbstractRichFunction
> >> would allow to only care about getting the key.
> >> Not sure, which one is better. I don't like the idea of more overloaded
> >> methods to get Lambdas for `withKey` interfaces too much because we have
> >> already so many overlaods. On the other hand, I do see value in
> >> supporting Lambdas for `withKey`.
> >
> >
> > Just to clarify, with current design we have only one extra overloaded
> > method per *withOnlyValue* interface:  which is *withKey* version of
> > particular interface.
> > We don't need extra overload for Rich function as Rich function
> implements
> > *withKey* interface as a result they have same type. We differentiate
> them
> > while building the topology.
> > We supported lambdas for *withKey* APIs because of the comment:
> >
> > @Jeyhun: I did not put any thought into this, but can we have a design
> >> that allows for both? Also, with regard to lambdas, it might make sense
> >> to allow for both `V -> newV` and `(K, V) -> newV` ?
> >
> >
> > However, I don't think that this complicates the overall design
> > significantly.
> >
> >
> > Depending on what we want to support, it might make sense to
> >> include/exclude RichFunctions from this KIP -- and thus, this also
> >> determines if we should have a "ProcessorContext KIP" before driving
> >> this KIP further.
> >
> >
> > Based on our discussion I think we should keep Rich functions as I don't
> > think that they bring extra layer of overhead to library.
> >
> > Any comments are appreciated.
> >
> > Cheers,
> > Jeyhun
> >
> >
> > On Tue, May 16, 2017 at 12:10 AM Matthias J. Sax <[email protected]>
> > wrote:
> >
> >> Jeyhun,
> >>
> >> thanks for the update.
> >>
> >> I think supporting Lambdas for `withKey` and `AbstractRichFunction`
> >> don't go together, as Lambdas are only supported for interfaces AFAIK.
> >>
> >> Thus, if we want to support Lambdas for `withKey`, we need to have a
> >> interface approach like this
> >>
> >>   - RichFunction -> only adding init() and close()
> >>
> >>   - ValueMapper
> >>   - ValueMapperWithKey
> >>
> >>   - RichValueMapper extends ValueMapperWithKey, RichFunction
> >>
> >> For this approach, AbstractRichFunction does not make sense anymore, as
> >> the only purpose of `RichFunction` is to allow the implementation of
> >> init() and close() -- if you don't want those, you would implement a
> >> different interface (ie, ValueMapperWithKey)
> >>
> >> As an alternative, we could argue, that it is sufficient to support
> >> Lambdas for the "plain" API only, but not for any "extended API". For
> >> this, RichFunction could add key+init+close and AbstractRichFunction
> >> would allow to only care about getting the key.
> >>
> >> Not sure, which one is better. I don't like the idea of more overloaded
> >> methods to get Lambdas for `withKey` interfaces too much because we have
> >> already so many overlaods. On the other hand, I do see value in
> >> supporting Lambdas for `withKey`.
> >>
> >> Depending on what we want to support, it might make sense to
> >> include/exclude RichFunctions from this KIP -- and thus, this also
> >> determines if we should have a "ProcessorContext KIP" before driving
> >> this KIP further.
> >>
> >> Thoughts?
> >>
> >>
> >>
> >>
> >> -Matthias
> >>
> >>
> >> On 5/15/17 11:01 AM, Jeyhun Karimov wrote:
> >>> Hi,
> >>>
> >>> Sorry for super late response. Thanks for your comments.
> >>>
> >>> I am not an expert on Lambdas. Can you elaborate a little bit? I cannot
> >>>> follow the explanation in the KIP to see what the problem is.
> >>>
> >>>
> >>> - From [1] says "A functional interface is an interface that has just
> one
> >>> abstract method, and thus represents a single function contract".
> >>> So basically once we extend some interface from another (in our case,
> >>> ValueMapperWithKey from ValueMapper) we cannot use lambdas in the
> >> extended
> >>> interface.
> >>>
> >>>
> >>> Further comments:
> >>>>  - The KIP get a little hard to read -- can you maybe reformat the
> wiki
> >>>> page a little bit? I think using `CodeBlock` would help.
> >>>
> >>>
> >>> - I will work on the KIP.
> >>>
> >>>  - What about KStream-KTable joins? You don't have overlaods added for
> >>>> them. Why? (Even if I still hope that we don't need to add any new
> >>>> overloads)
> >>>
> >>>
> >>> - Actually there are more than one Processor and public APIs to be
> >>> changed (KStream-KTable
> >>> joins is one case). However all of them has similar structure: we
> >> overload
> >>> the *method* with  *methodWithKey*,
> >>> wrap it into the Rich function, send to processor and inside the
> >> processor
> >>> call *init* and *close* methods of the Rich function.
> >>> As I wrote in KIP, I wanted to demonstrate the overall idea with only
> >>> *ValueMapper* as the same can be applied to all changes.
> >>> Anyway I will update the KIP.
> >>>
> >>>  - Why do we need `AbstractRichFunction`?
> >>>
> >>>
> >>> Instead of overriding the *init(ProcessorContext p)* and* close()*
> >> methods
> >>> in every Rich function with empty body like:
> >>>
> >>> @Override
> >>> void init(ProcessorContext context) {}
> >>>
> >>> @Override
> >>> void close () {}
> >>>
> >>> I thought that we can override them once in *AbstractRichFunction* and
> >>> extent new Rich functions from *AbstractRichFunction*.
> >>> Basically this can eliminate code copy-paste and ease the maintenance.
> >>>
> >>>  - What about interfaces Initializer, ForeachAction, Merger, Predicate,
> >>>> Reducer? I don't want to say we should/need to add to all, but we
> should
> >>>> discuss all of them and add where it does make sense (e.g.,
> >>>> RichForachAction does make sense IMHO)
> >>>
> >>>
> >>> Definitely agree. As I said, the same technique applies to all this
> >>> interfaces and I didn't want to explode the KIP, just wanted to give
> the
> >>> overall intuition.
> >>> However, I will update the KIP as I said.
> >>>
> >>>
> >>> Btw: I like the hierarchy `ValueXX` -- `ValueXXWithKey` --
> `RichValueXX`
> >>>> in general -- but why can't we do all this with interfaces only?
> >>>
> >>>
> >>> Sure we can. However the main intuition is we should not force users to
> >>> implement *init(ProcessorContext)* and *close()* functions every time
> >> they
> >>> use Rich functions.
> >>> If one needs, she can override the respective methods. However, I am
> open
> >>> for discussion.
> >>>
> >>>
> >>> I'd rather not see the use of  `ProcessorContext` spread any further
> than
> >>>> it currently is. So maybe we need another KIP that is done before
> this?
> >>>> Otherwise i think the scope of this KIP is becoming too large.
> >>>
> >>>
> >>> That is good point. I wanted to make *init(ProcessorContext)* method
> >>> persistent among the library (which use ProcessorContext as an input),
> >>> therefore I put *ProcessorContext* as an input.
> >>> So the important question is that (as @dguy and @mjsax mentioned)
> whether
> >>> continue this KIP without providing users an access to
> *ProcessorContext*
> >>> (change *init (ProcessorContext)* to * init()* ) or
> >>> initiate another KIP before this.
> >>>
> >>> [1]
> >>> http://cr.openjdk.java.net/~mr/se/8/java-se-8-pfd-spec/java-
> >> se-8-jls-pfd-diffs.pdf
> >>>
> >>>
> >>> Cheers,
> >>> Jeyhun
> >>>
> >>> On Mon, May 15, 2017 at 7:15 PM, Damian Guy <[email protected]>
> >> wrote:
> >>>
> >>>> I'd rather not see the use of  `ProcessorContext` spread any further
> >> than
> >>>> it currently is. So maybe we need another KIP that is done before
> this?
> >>>> Otherwise i think the scope of this KIP is becoming too large.
> >>>>
> >>>>
> >>>> On Mon, 15 May 2017 at 18:06 Matthias J. Sax <[email protected]>
> >>>> wrote:
> >>>>
> >>>>> I agree that that `ProcessorContext` interface is too broad in
> general
> >>>>> -- this is even true for transform/process, and it's also reflected
> in
> >>>>> the API improvement list we want to do.
> >>>>>
> >>>>> https://cwiki.apache.org/confluence/display/KAFKA/
> >>>> Kafka+Streams+Discussions
> >>>>>
> >>>>> So I am wondering, if you question the `RichFunction` approach in
> >>>>> general? Or if you suggest to either extend the scope of this KIP to
> >>>>> include this---or maybe better, do another KIP for it and delay this
> >> KIP
> >>>>> until the other one is done?
> >>>>>
> >>>>>
> >>>>> -Matthias
> >>>>>
> >>>>> On 5/15/17 2:35 AM, Damian Guy wrote:
> >>>>>> Thanks for the KIP.
> >>>>>>
> >>>>>> I'm not convinced on the `RichFunction` approach. Do we really want
> to
> >>>>> give
> >>>>>> every DSL method access to the `ProcessorContext` ? It has a bunch
> of
> >>>>>> methods on it that seem in-appropriate for some of the DSL methods,
> >>>> i.e,
> >>>>>> `register`, `getStateStore`, `forward`, `schedule` etc. It is far
> too
> >>>>>> broad. I think it would be better to have a narrower interface like
> >> the
> >>>>>> `RecordContext`  - remembering it is easier to add
> methods/interfaces
> >>>>> later
> >>>>>> than to remove them
> >>>>>>
> >>>>>> On Sat, 13 May 2017 at 22:26 Matthias J. Sax <[email protected]
> >
> >>>>> wrote:
> >>>>>>
> >>>>>>> Jeyhun,
> >>>>>>>
> >>>>>>> I am not an expert on Lambdas. Can you elaborate a little bit? I
> >>>> cannot
> >>>>>>> follow the explanation in the KIP to see what the problem is.
> >>>>>>>
> >>>>>>> For updating the KIP title I don't know -- guess it's up to you.
> >>>> Maybe a
> >>>>>>> committer can comment on this?
> >>>>>>>
> >>>>>>>
> >>>>>>> Further comments:
> >>>>>>>
> >>>>>>>  - The KIP get a little hard to read -- can you maybe reformat the
> >>>> wiki
> >>>>>>> page a little bit? I think using `CodeBlock` would help.
> >>>>>>>
> >>>>>>>  - What about KStream-KTable joins? You don't have overlaods added
> >> for
> >>>>>>> them. Why? (Even if I still hope that we don't need to add any new
> >>>>>>> overloads)
> >>>>>>>
> >>>>>>>  - Why do we need `AbstractRichFunction`?
> >>>>>>>
> >>>>>>>  - What about interfaces Initializer, ForeachAction, Merger,
> >>>> Predicate,
> >>>>>>> Reducer? I don't want to say we should/need to add to all, but we
> >>>> should
> >>>>>>> discuss all of them and add where it does make sense (e.g.,
> >>>>>>> RichForachAction does make sense IMHO)
> >>>>>>>
> >>>>>>>
> >>>>>>> Btw: I like the hierarchy `ValueXX` -- `ValueXXWithKey` --
> >>>> `RichValueXX`
> >>>>>>> in general -- but why can't we do all this with interfaces only?
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> -Matthias
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 5/11/17 7:00 AM, Jeyhun Karimov wrote:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> Thanks for your comments. I think we cannot extend the two
> >> interfaces
> >>>>> if
> >>>>>>> we
> >>>>>>>> want to keep lambdas. I updated the KIP [1]. Maybe I should change
> >>>> the
> >>>>>>>> title, because now we are not limiting the KIP to only
> ValueMapper,
> >>>>>>>> ValueTransformer and ValueJoiner.
> >>>>>>>> Please feel free to comment.
> >>>>>>>>
> >>>>>>>> [1]
> >>>>>>>>
> >>>>>>>
> >>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>> 149%3A+Enabling+key+access+in+ValueTransformer%2C+
> >>>> ValueMapper%2C+and+ValueJoiner
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>> Jeyhun
> >>>>>>>>
> >>>>>>>> On Tue, May 9, 2017 at 7:36 PM Matthias J. Sax <
> >>>> [email protected]>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> If `ValueMapperWithKey` extends `ValueMapper` we don't need the
> new
> >>>>>>>>> overlaod.
> >>>>>>>>>
> >>>>>>>>> And yes, we need to do one check -- but this happens when
> building
> >>>> the
> >>>>>>>>> topology. At runtime (I mean after KafkaStream#start() we don't
> >> need
> >>>>> any
> >>>>>>>>> check as we will always use `ValueMapperWithKey`)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> -Matthias
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 5/9/17 2:55 AM, Jeyhun Karimov wrote:
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> Thanks for feedback.
> >>>>>>>>>> Then we need to overload method
> >>>>>>>>>>   <VR> KStream<K, VR> mapValues(ValueMapper<? super V, ? extends
> >>>> VR>
> >>>>>>>>>> mapper);
> >>>>>>>>>> with
> >>>>>>>>>>   <VR> KStream<K, VR> mapValues(ValueMapperWithKey<? super V, ?
> >>>>> extends
> >>>>>>>>> VR>
> >>>>>>>>>> mapper);
> >>>>>>>>>>
> >>>>>>>>>> and in runtime (inside processor) we still have to check it is
> >>>>>>>>> ValueMapper
> >>>>>>>>>> or ValueMapperWithKey before wrapping it into the rich function.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Please correct me if I am wrong.
> >>>>>>>>>>
> >>>>>>>>>> Cheers,
> >>>>>>>>>> Jeyhun
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Tue, May 9, 2017 at 10:56 AM Michal Borowiecki <
> >>>>>>>>>> [email protected]> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> +1 :)
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On 08/05/17 23:52, Matthias J. Sax wrote:
> >>>>>>>>>>>> Hi,
> >>>>>>>>>>>>
> >>>>>>>>>>>> I was reading the updated KIP and I am wondering, if we should
> >> do
> >>>>> the
> >>>>>>>>>>>> design a little different.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Instead of distinguishing between a RichFunction and
> >>>>> non-RichFunction
> >>>>>>>>> at
> >>>>>>>>>>>> runtime level, we would use RichFunctions all the time. Thus,
> on
> >>>>> the
> >>>>>>>>> DSL
> >>>>>>>>>>>> entry level, if a user provides a non-RichFunction, we wrap it
> >>>> by a
> >>>>>>>>>>>> RichFunction that is fully implemented by Streams. This
> >>>>> RichFunction
> >>>>>>>>>>>> would just forward the call omitting the key:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Just to sketch the idea (incomplete code snippet):
> >>>>>>>>>>>>
> >>>>>>>>>>>>> public StreamsRichValueMapper implements RichValueMapper() {
> >>>>>>>>>>>>>    private ValueMapper userProvidedMapper; // set by
> >> constructor
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>    public VR apply(final K key, final V1 value1, final V2
> >>>> value2)
> >>>>> {
> >>>>>>>>>>>>>        return userProvidedMapper(value1, value2);
> >>>>>>>>>>>>>    }
> >>>>>>>>>>>>> }
> >>>>>>>>>>>>
> >>>>>>>>>>>>  From a performance point of view, I am not sure if the
> >>>>>>>>>>>> "if(isRichFunction)" including casts etc would have more
> >> overhead
> >>>>>>> than
> >>>>>>>>>>>> this approach (we would do more nested method call for
> >>>>>>> non-RichFunction
> >>>>>>>>>>>> which should be more common than RichFunctions).
> >>>>>>>>>>>>
> >>>>>>>>>>>> This approach should not effect lambdas (or do I miss
> >> something?)
> >>>>> and
> >>>>>>>>>>>> might be cleaner, as we could have one more top level
> interface
> >>>>>>>>>>>> `RichFunction` with methods `init()` and `close()` and also
> >>>>>>> interfaces
> >>>>>>>>>>>> for `RichValueMapper` etc. (thus, no abstract classes
> required).
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Any thoughts?
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 5/6/17 5:29 PM, Jeyhun Karimov wrote:
> >>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks for comments. I extended PR and KIP to include rich
> >>>>>>> functions.
> >>>>>>>>> I
> >>>>>>>>>>>>> will still have to evaluate the cost of deep copying of keys.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>> Jeyhun
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Fri, May 5, 2017 at 8:02 PM Mathieu Fenniak <
> >>>>>>>>>>> [email protected]>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hey Matthias,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> My opinion would be that documenting the immutability of the
> >>>> key
> >>>>> is
> >>>>>>>>> the
> >>>>>>>>>>>>>> best approach available.  Better than requiring the key to
> be
> >>>>>>>>>>> serializable
> >>>>>>>>>>>>>> (as with Jeyhun's last pass at the PR), no performance risk.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> It'd be different if Java had immutable type constraints of
> >>>> some
> >>>>>>>>> kind.
> >>>>>>>>>>> :-)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Mathieu
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Fri, May 5, 2017 at 11:31 AM, Matthias J. Sax <
> >>>>>>>>>>> [email protected]>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Agreed about RichFunction. If we follow this path, it
> should
> >>>>> cover
> >>>>>>>>>>>>>>> all(?) DSL interfaces.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> About guarding the key -- I am still not sure what to do
> >> about
> >>>>>>> it...
> >>>>>>>>>>>>>>> Maybe it might be enough to document it (and name the key
> >>>>>>> parameter
> >>>>>>>>>>> like
> >>>>>>>>>>>>>>> `readOnlyKey` to make is very clear). Ultimately, I would
> >>>> prefer
> >>>>>>> to
> >>>>>>>>>>>>>>> guard against any modification, but I have no good idea how
> >> to
> >>>>> do
> >>>>>>>>>>> this.
> >>>>>>>>>>>>>>> Not sure what others think about the risk of corrupted
> >>>>>>> partitioning
> >>>>>>>>>>>>>>> (what would be a user error and we could say, well, bad
> luck,
> >>>>> you
> >>>>>>>>> got
> >>>>>>>>>>> a
> >>>>>>>>>>>>>>> bug in your code, that's not our fault), vs deep copy with
> a
> >>>>>>>>> potential
> >>>>>>>>>>>>>>> performance hit (that we can't quantity atm without any
> >>>>>>> performance
> >>>>>>>>>>>>>> test).
> >>>>>>>>>>>>>>> We do have a performance system test. Maybe it's worth for
> >> you
> >>>>> to
> >>>>>>>>>>> apply
> >>>>>>>>>>>>>>> the deep copy strategy and run the test. It's very basic
> >>>>>>> performance
> >>>>>>>>>>>>>>> test only, but might give some insight. If you want to do
> >>>> this,
> >>>>>>> look
> >>>>>>>>>>>>>>> into folder "tests" for general test setup, and into
> >>>>>>>>>>>>>>> "tests/kafaktests/benchmarks/streams" to find find the perf
> >>>>> test.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On 5/5/17 8:55 AM, Jeyhun Karimov wrote:
> >>>>>>>>>>>>>>>> Hi Matthias,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I think extending KIP to include RichFunctions totally
> >> makes
> >>>>>>>>> sense.
> >>>>>>>>>>>>>> So,
> >>>>>>>>>>>>>>>>   we don't want to guard the keys because it is costly.
> >>>>>>>>>>>>>>>> If we introduce RichFunctions I think it should not be
> >>>> limited
> >>>>>>> only
> >>>>>>>>>>>>>> the 3
> >>>>>>>>>>>>>>>> interfaces proposed in KIP but to wide range of
> interfaces.
> >>>>>>>>>>>>>>>> Please correct me if I am wrong.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>> Jeyhun
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Fri, May 5, 2017 at 12:04 AM Matthias J. Sax <
> >>>>>>>>>>> [email protected]
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> One follow up. There was this email on the user list:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> http://search-hadoop.com/m/Kafka/uyzND17KhCaBzPSZ1?subj=
> >>>>>>>>>>>>>>> Shouldn+t+the+initializer+of+a+stream+aggregate+accept+the+
> >>>> key+
> >>>>>>>>>>>>>>>>> It might make sense so include Initializer, Adder, and
> >>>>>>> Substractor
> >>>>>>>>>>>>>>>>> inferface, too.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> And we should double check if there are other interface
> we
> >>>>> might
> >>>>>>>>>>> miss
> >>>>>>>>>>>>>>> atm.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On 5/4/17 1:31 PM, Matthias J. Sax wrote:
> >>>>>>>>>>>>>>>>>> Thanks for updating the KIP.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Deep copying the key will work for sure, but I am
> actually
> >>>> a
> >>>>>>>>> little
> >>>>>>>>>>>>>> bit
> >>>>>>>>>>>>>>>>>> worried about performance impact... We might want to do
> >>>> some
> >>>>>>> test
> >>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> quantify this impact.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Btw: this remind me about the idea of `RichFunction`
> >>>>> interface
> >>>>>>>>> that
> >>>>>>>>>>>>>>>>>> would allow users to access record metadata (like
> >>>> timestamp,
> >>>>>>>>>>> offset,
> >>>>>>>>>>>>>>>>>> partition etc) within DSL. This would be a similar
> >> concept.
> >>>>>>>>> Thus, I
> >>>>>>>>>>>>>> am
> >>>>>>>>>>>>>>>>>> wondering, if it would make sense to enlarge the scope
> of
> >>>>> this
> >>>>>>>>> KIP
> >>>>>>>>>>> by
> >>>>>>>>>>>>>>>>>> that? WDYT?
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On 5/3/17 2:08 AM, Jeyhun Karimov wrote:
> >>>>>>>>>>>>>>>>>>> Hi Mathieu,
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Thanks for feedback. I followed similar approach and
> >>>> updated
> >>>>>>> PR
> >>>>>>>>>>> and
> >>>>>>>>>>>>>>> KIP
> >>>>>>>>>>>>>>>>>>> accordingly. I tried to guard the key in Processors
> >>>> sending
> >>>>> a
> >>>>>>>>> copy
> >>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>> actual key.
> >>>>>>>>>>>>>>>>>>> Because I am doing deep copy of an object, I think
> memory
> >>>>> can
> >>>>>>> be
> >>>>>>>>>>>>>>>>> bottleneck
> >>>>>>>>>>>>>>>>>>> in some use-cases.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>>> Jeyhun
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Tue, May 2, 2017 at 5:10 PM Mathieu Fenniak <
> >>>>>>>>>>>>>>>>> [email protected]>
> >>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Hi Jeyhun,
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> This approach would change ValueMapper (...etc) to be
> >>>>>>> classes,
> >>>>>>>>>>>>>> rather
> >>>>>>>>>>>>>>>>> than
> >>>>>>>>>>>>>>>>>>>> interfaces, which is also a backwards incompatible
> >>>> change.
> >>>>>>> An
> >>>>>>>>>>>>>>>>> alternative
> >>>>>>>>>>>>>>>>>>>> approach that would be backwards compatible would be
> to
> >>>>>>> define
> >>>>>>>>>>> new
> >>>>>>>>>>>>>>>>>>>> interfaces, and provide overrides where those
> interfaces
> >>>>> are
> >>>>>>>>>>> used.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Unfortunately, making the key parameter as "final"
> >>>> doesn't
> >>>>>>>>> change
> >>>>>>>>>>>>>>> much
> >>>>>>>>>>>>>>>>>>>> about guarding against key change.  It only prevents
> the
> >>>>>>>>>>> parameter
> >>>>>>>>>>>>>>>>> variable
> >>>>>>>>>>>>>>>>>>>> from being reassigned.  If the key type is a mutable
> >>>> object
> >>>>>>>>> (eg.
> >>>>>>>>>>>>>>>>> byte[]),
> >>>>>>>>>>>>>>>>>>>> it can still be mutated. (eg. key[0] = 0).  But I'm
> not
> >>>>>>> really
> >>>>>>>>>>> sure
> >>>>>>>>>>>>>>>>> there's
> >>>>>>>>>>>>>>>>>>>> much that can be done about that.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Mathieu
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Mon, May 1, 2017 at 5:39 PM, Jeyhun Karimov <
> >>>>>>>>>>>>>> [email protected]
> >>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Thanks for comments.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> The concerns makes sense. Although we can guard for
> >>>>>>> immutable
> >>>>>>>>>>> keys
> >>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>> current implementation (with few changes), I didn't
> >>>>> consider
> >>>>>>>>>>>>>>> backward
> >>>>>>>>>>>>>>>>>>>>> compatibility.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> In this case 2 solutions come to my mind. In both
> >> cases,
> >>>>>>> user
> >>>>>>>>>>>>>>> accesses
> >>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> key in Object type, as passing extra type parameter
> >> will
> >>>>>>> break
> >>>>>>>>>>>>>>>>>>>>> backwards-compatibility.  So user has to cast to
> actual
> >>>>> key
> >>>>>>>>>>> type.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> 1. Firstly, We can overload apply method with 2
> >> argument
> >>>>>>> (key
> >>>>>>>>>>> and
> >>>>>>>>>>>>>>>>> value)
> >>>>>>>>>>>>>>>>>>>>> and force key to be *final*. By doing this,  I think
> we
> >>>>> can
> >>>>>>>>>>>>>> address
> >>>>>>>>>>>>>>>>> both
> >>>>>>>>>>>>>>>>>>>>> backward-compatibility and guarding against key
> change.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> 2. Secondly, we can create class KeyAccess like:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> public class KeyAccess {
> >>>>>>>>>>>>>>>>>>>>>      Object key;
> >>>>>>>>>>>>>>>>>>>>>      public void beforeApply(final Object key) {
> >>>>>>>>>>>>>>>>>>>>>          this.key = key;
> >>>>>>>>>>>>>>>>>>>>>      }
> >>>>>>>>>>>>>>>>>>>>>      public Object getKey() {
> >>>>>>>>>>>>>>>>>>>>>          return key;
> >>>>>>>>>>>>>>>>>>>>>      }
> >>>>>>>>>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> We can extend *ValueMapper, ValueJoiner* and
> >>>>>>>>> *ValueTransformer*
> >>>>>>>>>>>>>> from
> >>>>>>>>>>>>>>>>>>>>> *KeyAccess*. Inside processor (for example
> >>>>>>>>>>>>>>> *KTableMapValuesProcessor*)
> >>>>>>>>>>>>>>>>>>>>> before calling *mapper.apply(value)* we can set the
> >>>> *key*
> >>>>> by
> >>>>>>>>>>>>>>>>>>>>> *mapper.beforeApply(key)*. As a result, user can use
> >>>>>>>>> *getKey()*
> >>>>>>>>>>> to
> >>>>>>>>>>>>>>>>> access
> >>>>>>>>>>>>>>>>>>>>> the key inside *apply(value)* method.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>>>>> Jeyhun
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Mon, May 1, 2017 at 7:24 PM Matthias J. Sax <
> >>>>>>>>>>>>>>> [email protected]
> >>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Jeyhun,
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> thanks a lot for the KIP!
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> I think there are two issues we need to address:
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> (1) The KIP does not consider backward
> compatibility.
> >>>>> Users
> >>>>>>>>> did
> >>>>>>>>>>>>>>>>>>>> complain
> >>>>>>>>>>>>>>>>>>>>>> about this in past releases already, and as the user
> >>>> base
> >>>>>>>>>>> grows,
> >>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>>>> should not break backward compatibility in future
> >>>>> releases
> >>>>>>>>>>>>>> anymore.
> >>>>>>>>>>>>>>>>>>>>>> Thus, we should think of a better way to allow key
> >>>>> access.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Mathieu's comment goes into the same direction
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> On the other hand, the number of compile failures
> >>>> that
> >>>>>>>>> would
> >>>>>>>>>>>>>> need
> >>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>>>> fixed from this change is unfortunate. :-)
> >>>>>>>>>>>>>>>>>>>>>> (2) Another concern is, that there is no guard to
> >>>> prevent
> >>>>>>>>> user
> >>>>>>>>>>>>>> code
> >>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>> modify the key. This might corrupt partitioning if
> >>>> users
> >>>>> do
> >>>>>>>>>>> alter
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> key (accidentally -- or users are just not aware
> that
> >>>>> they
> >>>>>>>>> are
> >>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>>>> allowed to modify the provided key object) and thus
> >>>> break
> >>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> application. (This was the original motivation to
> not
> >>>>>>> provide
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>> key
> >>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>> the first place -- it's guards against
> modification.)
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> On 5/1/17 6:31 AM, Mathieu Fenniak wrote:
> >>>>>>>>>>>>>>>>>>>>>>> Hi Jeyhun,
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> I just want to add my voice that, I too, have
> wished
> >>>> for
> >>>>>>>>>>> access
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>> record key during a mapValues or similar operation.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On the other hand, the number of compile failures
> >> that
> >>>>>>> would
> >>>>>>>>>>>>>> need
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>>> fixed from this change is unfortunate. :-)  But at
> >>>> least
> >>>>>>> it
> >>>>>>>>>>>>>> would
> >>>>>>>>>>>>>>>>> all
> >>>>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>>>> pretty clear and easy change.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Mathieu
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On Mon, May 1, 2017 at 6:55 AM, Jeyhun Karimov <
> >>>>>>>>>>>>>>>>> [email protected]
> >>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>> Dear community,
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> I want to share KIP-149 [1] based on issues
> >>>> KAFKA-4218
> >>>>>>> [2],
> >>>>>>>>>>>>>>>>>>>> KAFKA-4726
> >>>>>>>>>>>>>>>>>>>>>> [3],
> >>>>>>>>>>>>>>>>>>>>>>>> KAFKA-3745 [4]. The related PR can be found at
> [5].
> >>>>>>>>>>>>>>>>>>>>>>>> I would like to get your comments.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>>>>>> https://cwiki.apache.org/
> >>>> confluence/display/KAFKA/KIP-
> >>>>>>>>>>>>>>>>>>>>>>>> 149%3A+Enabling+key+access+in+ValueTransformer%2C+
> >>>>>>>>>>>>>>>>>>>>>>>> ValueMapper%2C+and+ValueJoiner
> >>>>>>>>>>>>>>>>>>>>>>>> [2] https://issues.apache.org/jira
> >> /browse/KAFKA-4218
> >>>>>>>>>>>>>>>>>>>>>>>> [3] https://issues.apache.org/jira
> >> /browse/KAFKA-4726
> >>>>>>>>>>>>>>>>>>>>>>>> [4] https://issues.apache.org/jira
> >> /browse/KAFKA-3745
> >>>>>>>>>>>>>>>>>>>>>>>> [5] https://github.com/apache/kafka/pull/2946
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>>>>>>>> Jeyhun
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>>>> -Cheers
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Jeyhun
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>> -Cheers
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Jeyhun
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>> -Cheers
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Jeyhun
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>> -Cheers
> >>>>>>>>>>
> >>>>>>>>>> Jeyhun
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>> -Cheers
> >>>>>>>>
> >>>>>>>> Jeyhun
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> >>
> >
>
> --
-Cheers

Jeyhun

Re: [DISCUSS]: KIP-149: Enabling key access in ValueTransformer, ValueMapper, and ValueJoiner

Reply via email to