Re: [DISCUSS]: KIP-149: Enabling key access in ValueTransformer, ValueMapper, and ValueJoiner

Jeyhun Karimov Tue, 16 May 2017 07:04:20 -0700

Hi,

Thanks for your comments.


I think supporting Lambdas for `withKey` and `AbstractRichFunction`
> don't go together, as Lambdas are only supported for interfaces AFAIK.


Maybe I misunderstood your comment.
*withKey* and and *withOnlyValue* are interfaces. So they don't have direct
relation with *AbstractRichFunction*.
*withKey* and and *withOnlyValue* interfaces have only one  method , so we
can use lambdas.
Where does the *AbstractRichFunction* comes to play? Inside Rich functions.
And we use Rich functions in 2 places:

1. User doesn't use rich functions. Just regular *withKey* and and
*withOnlyValue* interfaces(both support lambdas) . We get those interfaces
and wrap into Rich function while building the topology, and send to
Processor.
2. User does use rich functions (Rich functions implement *withKey*
interface). As a result no lamdas here by definition. In this case, while
building the topology we do a type check if the object type is
*withKey* or *RichFunction*.

So *AbstractRichFunction* is just syntactic sugar for Rich functions and
does not affect using lambdas.

Thus, if we want to support Lambdas for `withKey`, we need to have a
> interface approach like this
>   - RichFunction -> only adding init() and close()
>   - ValueMapper
>   - ValueMapperWithKey
>   - RichValueMapper extends ValueMapperWithKey, RichFunction


As I said above, currently we support lambdas for *withKey* interfaces as
well.  However, I agree with your idea and I will remove the
AbstractRichFunction from the design.

As an alternative, we could argue, that it is sufficient to support
> Lambdas for the "plain" API only, but not for any "extended API". For
> this, RichFunction could add key+init+close and AbstractRichFunction
> would allow to only care about getting the key.
> Not sure, which one is better. I don't like the idea of more overloaded
> methods to get Lambdas for `withKey` interfaces too much because we have
> already so many overlaods. On the other hand, I do see value in
> supporting Lambdas for `withKey`.


Just to clarify, with current design we have only one extra overloaded
method per *withOnlyValue* interface:  which is *withKey* version of
particular interface.
We don't need extra overload for Rich function as Rich function implements
*withKey* interface as a result they have same type. We differentiate them
while building the topology.
We supported lambdas for *withKey* APIs because of the comment:

@Jeyhun: I did not put any thought into this, but can we have a design
> that allows for both? Also, with regard to lambdas, it might make sense
> to allow for both `V -> newV` and `(K, V) -> newV` ?


However, I don't think that this complicates the overall design
significantly.


Depending on what we want to support, it might make sense to
> include/exclude RichFunctions from this KIP -- and thus, this also
> determines if we should have a "ProcessorContext KIP" before driving
> this KIP further.


Based on our discussion I think we should keep Rich functions as I don't
think that they bring extra layer of overhead to library.

Any comments are appreciated.

Cheers,
Jeyhun


On Tue, May 16, 2017 at 12:10 AM Matthias J. Sax <[email protected]>
wrote:

> Jeyhun,
>
> thanks for the update.
>
> I think supporting Lambdas for `withKey` and `AbstractRichFunction`
> don't go together, as Lambdas are only supported for interfaces AFAIK.
>
> Thus, if we want to support Lambdas for `withKey`, we need to have a
> interface approach like this
>
>   - RichFunction -> only adding init() and close()
>
>   - ValueMapper
>   - ValueMapperWithKey
>
>   - RichValueMapper extends ValueMapperWithKey, RichFunction
>
> For this approach, AbstractRichFunction does not make sense anymore, as
> the only purpose of `RichFunction` is to allow the implementation of
> init() and close() -- if you don't want those, you would implement a
> different interface (ie, ValueMapperWithKey)
>
> As an alternative, we could argue, that it is sufficient to support
> Lambdas for the "plain" API only, but not for any "extended API". For
> this, RichFunction could add key+init+close and AbstractRichFunction
> would allow to only care about getting the key.
>
> Not sure, which one is better. I don't like the idea of more overloaded
> methods to get Lambdas for `withKey` interfaces too much because we have
> already so many overlaods. On the other hand, I do see value in
> supporting Lambdas for `withKey`.
>
> Depending on what we want to support, it might make sense to
> include/exclude RichFunctions from this KIP -- and thus, this also
> determines if we should have a "ProcessorContext KIP" before driving
> this KIP further.
>
> Thoughts?
>
>
>
>
> -Matthias
>
>
> On 5/15/17 11:01 AM, Jeyhun Karimov wrote:
> > Hi,
> >
> > Sorry for super late response. Thanks for your comments.
> >
> > I am not an expert on Lambdas. Can you elaborate a little bit? I cannot
> >> follow the explanation in the KIP to see what the problem is.
> >
> >
> > - From [1] says "A functional interface is an interface that has just one
> > abstract method, and thus represents a single function contract".
> > So basically once we extend some interface from another (in our case,
> > ValueMapperWithKey from ValueMapper) we cannot use lambdas in the
> extended
> > interface.
> >
> >
> > Further comments:
> >>  - The KIP get a little hard to read -- can you maybe reformat the wiki
> >> page a little bit? I think using `CodeBlock` would help.
> >
> >
> > - I will work on the KIP.
> >
> >  - What about KStream-KTable joins? You don't have overlaods added for
> >> them. Why? (Even if I still hope that we don't need to add any new
> >> overloads)
> >
> >
> > - Actually there are more than one Processor and public APIs to be
> > changed (KStream-KTable
> > joins is one case). However all of them has similar structure: we
> overload
> > the *method* with  *methodWithKey*,
> > wrap it into the Rich function, send to processor and inside the
> processor
> > call *init* and *close* methods of the Rich function.
> > As I wrote in KIP, I wanted to demonstrate the overall idea with only
> > *ValueMapper* as the same can be applied to all changes.
> > Anyway I will update the KIP.
> >
> >  - Why do we need `AbstractRichFunction`?
> >
> >
> > Instead of overriding the *init(ProcessorContext p)* and* close()*
> methods
> > in every Rich function with empty body like:
> >
> > @Override
> > void init(ProcessorContext context) {}
> >
> > @Override
> > void close () {}
> >
> > I thought that we can override them once in *AbstractRichFunction* and
> > extent new Rich functions from *AbstractRichFunction*.
> > Basically this can eliminate code copy-paste and ease the maintenance.
> >
> >  - What about interfaces Initializer, ForeachAction, Merger, Predicate,
> >> Reducer? I don't want to say we should/need to add to all, but we should
> >> discuss all of them and add where it does make sense (e.g.,
> >> RichForachAction does make sense IMHO)
> >
> >
> > Definitely agree. As I said, the same technique applies to all this
> > interfaces and I didn't want to explode the KIP, just wanted to give the
> > overall intuition.
> > However, I will update the KIP as I said.
> >
> >
> > Btw: I like the hierarchy `ValueXX` -- `ValueXXWithKey` -- `RichValueXX`
> >> in general -- but why can't we do all this with interfaces only?
> >
> >
> > Sure we can. However the main intuition is we should not force users to
> > implement *init(ProcessorContext)* and *close()* functions every time
> they
> > use Rich functions.
> > If one needs, she can override the respective methods. However, I am open
> > for discussion.
> >
> >
> > I'd rather not see the use of  `ProcessorContext` spread any further than
> >> it currently is. So maybe we need another KIP that is done before this?
> >> Otherwise i think the scope of this KIP is becoming too large.
> >
> >
> > That is good point. I wanted to make *init(ProcessorContext)* method
> > persistent among the library (which use ProcessorContext as an input),
> > therefore I put *ProcessorContext* as an input.
> > So the important question is that (as @dguy and @mjsax mentioned) whether
> > continue this KIP without providing users an access to *ProcessorContext*
> > (change *init (ProcessorContext)* to * init()* ) or
> > initiate another KIP before this.
> >
> > [1]
> > http://cr.openjdk.java.net/~mr/se/8/java-se-8-pfd-spec/java-
> se-8-jls-pfd-diffs.pdf
> >
> >
> > Cheers,
> > Jeyhun
> >
> > On Mon, May 15, 2017 at 7:15 PM, Damian Guy <[email protected]>
> wrote:
> >
> >> I'd rather not see the use of  `ProcessorContext` spread any further
> than
> >> it currently is. So maybe we need another KIP that is done before this?
> >> Otherwise i think the scope of this KIP is becoming too large.
> >>
> >>
> >> On Mon, 15 May 2017 at 18:06 Matthias J. Sax <[email protected]>
> >> wrote:
> >>
> >>> I agree that that `ProcessorContext` interface is too broad in general
> >>> -- this is even true for transform/process, and it's also reflected in
> >>> the API improvement list we want to do.
> >>>
> >>> https://cwiki.apache.org/confluence/display/KAFKA/
> >> Kafka+Streams+Discussions
> >>>
> >>> So I am wondering, if you question the `RichFunction` approach in
> >>> general? Or if you suggest to either extend the scope of this KIP to
> >>> include this---or maybe better, do another KIP for it and delay this
> KIP
> >>> until the other one is done?
> >>>
> >>>
> >>> -Matthias
> >>>
> >>> On 5/15/17 2:35 AM, Damian Guy wrote:
> >>>> Thanks for the KIP.
> >>>>
> >>>> I'm not convinced on the `RichFunction` approach. Do we really want to
> >>> give
> >>>> every DSL method access to the `ProcessorContext` ? It has a bunch of
> >>>> methods on it that seem in-appropriate for some of the DSL methods,
> >> i.e,
> >>>> `register`, `getStateStore`, `forward`, `schedule` etc. It is far too
> >>>> broad. I think it would be better to have a narrower interface like
> the
> >>>> `RecordContext`  - remembering it is easier to add methods/interfaces
> >>> later
> >>>> than to remove them
> >>>>
> >>>> On Sat, 13 May 2017 at 22:26 Matthias J. Sax <[email protected]>
> >>> wrote:
> >>>>
> >>>>> Jeyhun,
> >>>>>
> >>>>> I am not an expert on Lambdas. Can you elaborate a little bit? I
> >> cannot
> >>>>> follow the explanation in the KIP to see what the problem is.
> >>>>>
> >>>>> For updating the KIP title I don't know -- guess it's up to you.
> >> Maybe a
> >>>>> committer can comment on this?
> >>>>>
> >>>>>
> >>>>> Further comments:
> >>>>>
> >>>>>  - The KIP get a little hard to read -- can you maybe reformat the
> >> wiki
> >>>>> page a little bit? I think using `CodeBlock` would help.
> >>>>>
> >>>>>  - What about KStream-KTable joins? You don't have overlaods added
> for
> >>>>> them. Why? (Even if I still hope that we don't need to add any new
> >>>>> overloads)
> >>>>>
> >>>>>  - Why do we need `AbstractRichFunction`?
> >>>>>
> >>>>>  - What about interfaces Initializer, ForeachAction, Merger,
> >> Predicate,
> >>>>> Reducer? I don't want to say we should/need to add to all, but we
> >> should
> >>>>> discuss all of them and add where it does make sense (e.g.,
> >>>>> RichForachAction does make sense IMHO)
> >>>>>
> >>>>>
> >>>>> Btw: I like the hierarchy `ValueXX` -- `ValueXXWithKey` --
> >> `RichValueXX`
> >>>>> in general -- but why can't we do all this with interfaces only?
> >>>>>
> >>>>>
> >>>>>
> >>>>> -Matthias
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 5/11/17 7:00 AM, Jeyhun Karimov wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> Thanks for your comments. I think we cannot extend the two
> interfaces
> >>> if
> >>>>> we
> >>>>>> want to keep lambdas. I updated the KIP [1]. Maybe I should change
> >> the
> >>>>>> title, because now we are not limiting the KIP to only ValueMapper,
> >>>>>> ValueTransformer and ValueJoiner.
> >>>>>> Please feel free to comment.
> >>>>>>
> >>>>>> [1]
> >>>>>>
> >>>>>
> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> 149%3A+Enabling+key+access+in+ValueTransformer%2C+
> >> ValueMapper%2C+and+ValueJoiner
> >>>>>>
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Jeyhun
> >>>>>>
> >>>>>> On Tue, May 9, 2017 at 7:36 PM Matthias J. Sax <
> >> [email protected]>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> If `ValueMapperWithKey` extends `ValueMapper` we don't need the new
> >>>>>>> overlaod.
> >>>>>>>
> >>>>>>> And yes, we need to do one check -- but this happens when building
> >> the
> >>>>>>> topology. At runtime (I mean after KafkaStream#start() we don't
> need
> >>> any
> >>>>>>> check as we will always use `ValueMapperWithKey`)
> >>>>>>>
> >>>>>>>
> >>>>>>> -Matthias
> >>>>>>>
> >>>>>>>
> >>>>>>> On 5/9/17 2:55 AM, Jeyhun Karimov wrote:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> Thanks for feedback.
> >>>>>>>> Then we need to overload method
> >>>>>>>>   <VR> KStream<K, VR> mapValues(ValueMapper<? super V, ? extends
> >> VR>
> >>>>>>>> mapper);
> >>>>>>>> with
> >>>>>>>>   <VR> KStream<K, VR> mapValues(ValueMapperWithKey<? super V, ?
> >>> extends
> >>>>>>> VR>
> >>>>>>>> mapper);
> >>>>>>>>
> >>>>>>>> and in runtime (inside processor) we still have to check it is
> >>>>>>> ValueMapper
> >>>>>>>> or ValueMapperWithKey before wrapping it into the rich function.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Please correct me if I am wrong.
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>> Jeyhun
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Tue, May 9, 2017 at 10:56 AM Michal Borowiecki <
> >>>>>>>> [email protected]> wrote:
> >>>>>>>>
> >>>>>>>>> +1 :)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 08/05/17 23:52, Matthias J. Sax wrote:
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> I was reading the updated KIP and I am wondering, if we should
> do
> >>> the
> >>>>>>>>>> design a little different.
> >>>>>>>>>>
> >>>>>>>>>> Instead of distinguishing between a RichFunction and
> >>> non-RichFunction
> >>>>>>> at
> >>>>>>>>>> runtime level, we would use RichFunctions all the time. Thus, on
> >>> the
> >>>>>>> DSL
> >>>>>>>>>> entry level, if a user provides a non-RichFunction, we wrap it
> >> by a
> >>>>>>>>>> RichFunction that is fully implemented by Streams. This
> >>> RichFunction
> >>>>>>>>>> would just forward the call omitting the key:
> >>>>>>>>>>
> >>>>>>>>>> Just to sketch the idea (incomplete code snippet):
> >>>>>>>>>>
> >>>>>>>>>>> public StreamsRichValueMapper implements RichValueMapper() {
> >>>>>>>>>>>    private ValueMapper userProvidedMapper; // set by
> constructor
> >>>>>>>>>>>
> >>>>>>>>>>>    public VR apply(final K key, final V1 value1, final V2
> >> value2)
> >>> {
> >>>>>>>>>>>        return userProvidedMapper(value1, value2);
> >>>>>>>>>>>    }
> >>>>>>>>>>> }
> >>>>>>>>>>
> >>>>>>>>>>  From a performance point of view, I am not sure if the
> >>>>>>>>>> "if(isRichFunction)" including casts etc would have more
> overhead
> >>>>> than
> >>>>>>>>>> this approach (we would do more nested method call for
> >>>>> non-RichFunction
> >>>>>>>>>> which should be more common than RichFunctions).
> >>>>>>>>>>
> >>>>>>>>>> This approach should not effect lambdas (or do I miss
> something?)
> >>> and
> >>>>>>>>>> might be cleaner, as we could have one more top level interface
> >>>>>>>>>> `RichFunction` with methods `init()` and `close()` and also
> >>>>> interfaces
> >>>>>>>>>> for `RichValueMapper` etc. (thus, no abstract classes required).
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Any thoughts?
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> -Matthias
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 5/6/17 5:29 PM, Jeyhun Karimov wrote:
> >>>>>>>>>>> Hi,
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks for comments. I extended PR and KIP to include rich
> >>>>> functions.
> >>>>>>> I
> >>>>>>>>>>> will still have to evaluate the cost of deep copying of keys.
> >>>>>>>>>>>
> >>>>>>>>>>> Cheers,
> >>>>>>>>>>> Jeyhun
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, May 5, 2017 at 8:02 PM Mathieu Fenniak <
> >>>>>>>>> [email protected]>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hey Matthias,
> >>>>>>>>>>>>
> >>>>>>>>>>>> My opinion would be that documenting the immutability of the
> >> key
> >>> is
> >>>>>>> the
> >>>>>>>>>>>> best approach available.  Better than requiring the key to be
> >>>>>>>>> serializable
> >>>>>>>>>>>> (as with Jeyhun's last pass at the PR), no performance risk.
> >>>>>>>>>>>>
> >>>>>>>>>>>> It'd be different if Java had immutable type constraints of
> >> some
> >>>>>>> kind.
> >>>>>>>>> :-)
> >>>>>>>>>>>>
> >>>>>>>>>>>> Mathieu
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Fri, May 5, 2017 at 11:31 AM, Matthias J. Sax <
> >>>>>>>>> [email protected]>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Agreed about RichFunction. If we follow this path, it should
> >>> cover
> >>>>>>>>>>>>> all(?) DSL interfaces.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> About guarding the key -- I am still not sure what to do
> about
> >>>>> it...
> >>>>>>>>>>>>> Maybe it might be enough to document it (and name the key
> >>>>> parameter
> >>>>>>>>> like
> >>>>>>>>>>>>> `readOnlyKey` to make is very clear). Ultimately, I would
> >> prefer
> >>>>> to
> >>>>>>>>>>>>> guard against any modification, but I have no good idea how
> to
> >>> do
> >>>>>>>>> this.
> >>>>>>>>>>>>> Not sure what others think about the risk of corrupted
> >>>>> partitioning
> >>>>>>>>>>>>> (what would be a user error and we could say, well, bad luck,
> >>> you
> >>>>>>> got
> >>>>>>>>> a
> >>>>>>>>>>>>> bug in your code, that's not our fault), vs deep copy with a
> >>>>>>> potential
> >>>>>>>>>>>>> performance hit (that we can't quantity atm without any
> >>>>> performance
> >>>>>>>>>>>> test).
> >>>>>>>>>>>>> We do have a performance system test. Maybe it's worth for
> you
> >>> to
> >>>>>>>>> apply
> >>>>>>>>>>>>> the deep copy strategy and run the test. It's very basic
> >>>>> performance
> >>>>>>>>>>>>> test only, but might give some insight. If you want to do
> >> this,
> >>>>> look
> >>>>>>>>>>>>> into folder "tests" for general test setup, and into
> >>>>>>>>>>>>> "tests/kafaktests/benchmarks/streams" to find find the perf
> >>> test.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 5/5/17 8:55 AM, Jeyhun Karimov wrote:
> >>>>>>>>>>>>>> Hi Matthias,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I think extending KIP to include RichFunctions totally
> makes
> >>>>>>> sense.
> >>>>>>>>>>>> So,
> >>>>>>>>>>>>>>   we don't want to guard the keys because it is costly.
> >>>>>>>>>>>>>> If we introduce RichFunctions I think it should not be
> >> limited
> >>>>> only
> >>>>>>>>>>>> the 3
> >>>>>>>>>>>>>> interfaces proposed in KIP but to wide range of interfaces.
> >>>>>>>>>>>>>> Please correct me if I am wrong.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>> Jeyhun
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Fri, May 5, 2017 at 12:04 AM Matthias J. Sax <
> >>>>>>>>> [email protected]
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> One follow up. There was this email on the user list:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> http://search-hadoop.com/m/Kafka/uyzND17KhCaBzPSZ1?subj=
> >>>>>>>>>>>>> Shouldn+t+the+initializer+of+a+stream+aggregate+accept+the+
> >> key+
> >>>>>>>>>>>>>>> It might make sense so include Initializer, Adder, and
> >>>>> Substractor
> >>>>>>>>>>>>>>> inferface, too.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> And we should double check if there are other interface we
> >>> might
> >>>>>>>>> miss
> >>>>>>>>>>>>> atm.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On 5/4/17 1:31 PM, Matthias J. Sax wrote:
> >>>>>>>>>>>>>>>> Thanks for updating the KIP.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Deep copying the key will work for sure, but I am actually
> >> a
> >>>>>>> little
> >>>>>>>>>>>> bit
> >>>>>>>>>>>>>>>> worried about performance impact... We might want to do
> >> some
> >>>>> test
> >>>>>>>>> to
> >>>>>>>>>>>>>>>> quantify this impact.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Btw: this remind me about the idea of `RichFunction`
> >>> interface
> >>>>>>> that
> >>>>>>>>>>>>>>>> would allow users to access record metadata (like
> >> timestamp,
> >>>>>>>>> offset,
> >>>>>>>>>>>>>>>> partition etc) within DSL. This would be a similar
> concept.
> >>>>>>> Thus, I
> >>>>>>>>>>>> am
> >>>>>>>>>>>>>>>> wondering, if it would make sense to enlarge the scope of
> >>> this
> >>>>>>> KIP
> >>>>>>>>> by
> >>>>>>>>>>>>>>>> that? WDYT?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On 5/3/17 2:08 AM, Jeyhun Karimov wrote:
> >>>>>>>>>>>>>>>>> Hi Mathieu,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Thanks for feedback. I followed similar approach and
> >> updated
> >>>>> PR
> >>>>>>>>> and
> >>>>>>>>>>>>> KIP
> >>>>>>>>>>>>>>>>> accordingly. I tried to guard the key in Processors
> >> sending
> >>> a
> >>>>>>> copy
> >>>>>>>>>>>> of
> >>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>> actual key.
> >>>>>>>>>>>>>>>>> Because I am doing deep copy of an object, I think memory
> >>> can
> >>>>> be
> >>>>>>>>>>>>>>> bottleneck
> >>>>>>>>>>>>>>>>> in some use-cases.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>> Jeyhun
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Tue, May 2, 2017 at 5:10 PM Mathieu Fenniak <
> >>>>>>>>>>>>>>> [email protected]>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Hi Jeyhun,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> This approach would change ValueMapper (...etc) to be
> >>>>> classes,
> >>>>>>>>>>>> rather
> >>>>>>>>>>>>>>> than
> >>>>>>>>>>>>>>>>>> interfaces, which is also a backwards incompatible
> >> change.
> >>>>> An
> >>>>>>>>>>>>>>> alternative
> >>>>>>>>>>>>>>>>>> approach that would be backwards compatible would be to
> >>>>> define
> >>>>>>>>> new
> >>>>>>>>>>>>>>>>>> interfaces, and provide overrides where those interfaces
> >>> are
> >>>>>>>>> used.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Unfortunately, making the key parameter as "final"
> >> doesn't
> >>>>>>> change
> >>>>>>>>>>>>> much
> >>>>>>>>>>>>>>>>>> about guarding against key change.  It only prevents the
> >>>>>>>>> parameter
> >>>>>>>>>>>>>>> variable
> >>>>>>>>>>>>>>>>>> from being reassigned.  If the key type is a mutable
> >> object
> >>>>>>> (eg.
> >>>>>>>>>>>>>>> byte[]),
> >>>>>>>>>>>>>>>>>> it can still be mutated. (eg. key[0] = 0).  But I'm not
> >>>>> really
> >>>>>>>>> sure
> >>>>>>>>>>>>>>> there's
> >>>>>>>>>>>>>>>>>> much that can be done about that.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Mathieu
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Mon, May 1, 2017 at 5:39 PM, Jeyhun Karimov <
> >>>>>>>>>>>> [email protected]
> >>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Thanks for comments.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> The concerns makes sense. Although we can guard for
> >>>>> immutable
> >>>>>>>>> keys
> >>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>> current implementation (with few changes), I didn't
> >>> consider
> >>>>>>>>>>>>> backward
> >>>>>>>>>>>>>>>>>>> compatibility.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> In this case 2 solutions come to my mind. In both
> cases,
> >>>>> user
> >>>>>>>>>>>>> accesses
> >>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> key in Object type, as passing extra type parameter
> will
> >>>>> break
> >>>>>>>>>>>>>>>>>>> backwards-compatibility.  So user has to cast to actual
> >>> key
> >>>>>>>>> type.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> 1. Firstly, We can overload apply method with 2
> argument
> >>>>> (key
> >>>>>>>>> and
> >>>>>>>>>>>>>>> value)
> >>>>>>>>>>>>>>>>>>> and force key to be *final*. By doing this,  I think we
> >>> can
> >>>>>>>>>>>> address
> >>>>>>>>>>>>>>> both
> >>>>>>>>>>>>>>>>>>> backward-compatibility and guarding against key change.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> 2. Secondly, we can create class KeyAccess like:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> public class KeyAccess {
> >>>>>>>>>>>>>>>>>>>      Object key;
> >>>>>>>>>>>>>>>>>>>      public void beforeApply(final Object key) {
> >>>>>>>>>>>>>>>>>>>          this.key = key;
> >>>>>>>>>>>>>>>>>>>      }
> >>>>>>>>>>>>>>>>>>>      public Object getKey() {
> >>>>>>>>>>>>>>>>>>>          return key;
> >>>>>>>>>>>>>>>>>>>      }
> >>>>>>>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> We can extend *ValueMapper, ValueJoiner* and
> >>>>>>> *ValueTransformer*
> >>>>>>>>>>>> from
> >>>>>>>>>>>>>>>>>>> *KeyAccess*. Inside processor (for example
> >>>>>>>>>>>>> *KTableMapValuesProcessor*)
> >>>>>>>>>>>>>>>>>>> before calling *mapper.apply(value)* we can set the
> >> *key*
> >>> by
> >>>>>>>>>>>>>>>>>>> *mapper.beforeApply(key)*. As a result, user can use
> >>>>>>> *getKey()*
> >>>>>>>>> to
> >>>>>>>>>>>>>>> access
> >>>>>>>>>>>>>>>>>>> the key inside *apply(value)* method.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>>> Jeyhun
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Mon, May 1, 2017 at 7:24 PM Matthias J. Sax <
> >>>>>>>>>>>>> [email protected]
> >>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Jeyhun,
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> thanks a lot for the KIP!
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> I think there are two issues we need to address:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> (1) The KIP does not consider backward compatibility.
> >>> Users
> >>>>>>> did
> >>>>>>>>>>>>>>>>>> complain
> >>>>>>>>>>>>>>>>>>>> about this in past releases already, and as the user
> >> base
> >>>>>>>>> grows,
> >>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>> should not break backward compatibility in future
> >>> releases
> >>>>>>>>>>>> anymore.
> >>>>>>>>>>>>>>>>>>>> Thus, we should think of a better way to allow key
> >>> access.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Mathieu's comment goes into the same direction
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> On the other hand, the number of compile failures
> >> that
> >>>>>>> would
> >>>>>>>>>>>> need
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>> fixed from this change is unfortunate. :-)
> >>>>>>>>>>>>>>>>>>>> (2) Another concern is, that there is no guard to
> >> prevent
> >>>>>>> user
> >>>>>>>>>>>> code
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>> modify the key. This might corrupt partitioning if
> >> users
> >>> do
> >>>>>>>>> alter
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>> key (accidentally -- or users are just not aware that
> >>> they
> >>>>>>> are
> >>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>> allowed to modify the provided key object) and thus
> >> break
> >>>>> the
> >>>>>>>>>>>>>>>>>>>> application. (This was the original motivation to not
> >>>>> provide
> >>>>>>>>> the
> >>>>>>>>>>>>> key
> >>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>> the first place -- it's guards against modification.)
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On 5/1/17 6:31 AM, Mathieu Fenniak wrote:
> >>>>>>>>>>>>>>>>>>>>> Hi Jeyhun,
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I just want to add my voice that, I too, have wished
> >> for
> >>>>>>>>> access
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> record key during a mapValues or similar operation.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On the other hand, the number of compile failures
> that
> >>>>> would
> >>>>>>>>>>>> need
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>> fixed from this change is unfortunate. :-)  But at
> >> least
> >>>>> it
> >>>>>>>>>>>> would
> >>>>>>>>>>>>>>> all
> >>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>> pretty clear and easy change.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Mathieu
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Mon, May 1, 2017 at 6:55 AM, Jeyhun Karimov <
> >>>>>>>>>>>>>>> [email protected]
> >>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>> Dear community,
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> I want to share KIP-149 [1] based on issues
> >> KAFKA-4218
> >>>>> [2],
> >>>>>>>>>>>>>>>>>> KAFKA-4726
> >>>>>>>>>>>>>>>>>>>> [3],
> >>>>>>>>>>>>>>>>>>>>>> KAFKA-3745 [4]. The related PR can be found at [5].
> >>>>>>>>>>>>>>>>>>>>>> I would like to get your comments.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>>>> https://cwiki.apache.org/
> >> confluence/display/KAFKA/KIP-
> >>>>>>>>>>>>>>>>>>>>>> 149%3A+Enabling+key+access+in+ValueTransformer%2C+
> >>>>>>>>>>>>>>>>>>>>>> ValueMapper%2C+and+ValueJoiner
> >>>>>>>>>>>>>>>>>>>>>> [2] https://issues.apache.org/jira
> /browse/KAFKA-4218
> >>>>>>>>>>>>>>>>>>>>>> [3] https://issues.apache.org/jira
> /browse/KAFKA-4726
> >>>>>>>>>>>>>>>>>>>>>> [4] https://issues.apache.org/jira
> /browse/KAFKA-3745
> >>>>>>>>>>>>>>>>>>>>>> [5] https://github.com/apache/kafka/pull/2946
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>>>>>> Jeyhun
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>> -Cheers
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Jeyhun
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>> -Cheers
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Jeyhun
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> -Cheers
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Jeyhun
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>> -Cheers
> >>>>>>>>
> >>>>>>>> Jeyhun
> >>>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>> -Cheers
> >>>>>>
> >>>>>> Jeyhun
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >
>
>

Re: [DISCUSS]: KIP-149: Enabling key access in ValueTransformer, ValueMapper, and ValueJoiner

Reply via email to