Re: [DISCUSS]: KIP-149: Enabling key access in ValueTransformer, ValueMapper, and ValueJoiner

Matthias J. Sax Wed, 24 May 2017 15:47:58 -0700

Jeyhun,

I was just wondering if you did look into the key-deep-copy idea we
discussed. I am curious to see what the impact might be.



-Matthias

On 5/20/17 2:03 AM, Jeyhun Karimov wrote:
> Hi,
> 
> Thanks for your comments. I rethink about including rich functions into
> this KIP.
> I think once we include rich functions in this KIP and then fix
> ProcessorContext in another KIP and incorporate with existing rich
> functions, the code will not be backwards compatible.
> 
> I see Damian's and your point more clearly now.
> 
> Conclusion: we include only withKey interfaces in this KIP (I updated the
> KIP), I will try also initiate another KIP for rich functions.
> 
> Cheers,
> Jeyhun
> 
> On Fri, May 19, 2017 at 10:50 PM Matthias J. Sax <[email protected]>
> wrote:
> 
>> With the current KIP, using ValueMapper and ValueMapperWithKey
>> interfaces, RichFunction seems to be an independent add-on. To fix the
>> original issue to allow key access, RichFunctions are not required IMHO.
>>
>> I initially put the RichFunction idea on the table, because I was hoping
>> to get a uniform API. And I think, is was good to consider them --
>> however, the discussion showed that they are not necessary for key
>> access. Thus, it seems to be better to handle RichFunctions in an own
>> KIP. The ProcessorContext/RecordContext issues seems to be a main
>> challenge for this. And introducing RichFunctions with parameter-less
>> init() method, seem not to help too much. We would get an "intermediate"
>> API that we want to change anyway later on...
>>
>> As you put already much effort into RichFunction, feel free do push this
>> further and start a new KIP (we can do this even in parallel) -- we
>> don't want to slow you down :) But it make the discussion and code
>> review easier, if we separate both IMHO.
>>
>>
>> -Matthias
>>
>>
>> On 5/19/17 2:25 AM, Jeyhun Karimov wrote:
>>> Hi Damian,
>>>
>>> Thanks for your comments. I think providing to users *interface* rather
>>> than *abstract class* should be preferred (Matthias also raised this
>> issue
>>> ), anyway I changed the corresponding parts of KIP.
>>>
>>> Regarding with passing additional contextual information, I think it is a
>>> tradeoff,
>>> 1) first, we fix the context parameter for *init() *method in another PR
>>> and solve Rich functions afterwards
>>> 2) first, we fix the requested issues on jira ([1-3]) with providing
>>> (not-complete) Rich functions and integrate the context parameters to
>> this
>>> afterwards (like improvement)
>>>
>>> To me, the second approach seems more incremental. However you are right,
>>> the names might confuse the users.
>>>
>>>
>>>
>>> [1] https://issues.apache.org/jira/browse/KAFKA-4218
>>> [2] https://issues.apache.org/jira/browse/KAFKA-4726
>>> [3] https://issues.apache.org/jira/browse/KAFKA-3745
>>>
>>>
>>> Cheers,
>>> Jeyhun
>>>
>>>
>>> On Fri, May 19, 2017 at 10:42 AM Damian Guy <[email protected]>
>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I see you've removed the `ProcessorContext` from the RichFunction which
>> is
>>>> good, but why is it a `RichFunction`? I'd have expected it to pass some
>>>> additional contextual information, i.e., the `RecordContext` that
>> contains
>>>> just the topic, partition, timestamp, offset.  I'm ok with it not
>> passing
>>>> this contextual information, but is the naming incorrect? I'm not sure,
>>>> tbh. I'm wondering if we should drop `RichFunctions` until we can do it
>>>> properly with the correct context?
>>>>
>>>> Also, i don't like the abstract classes: RichValueMapper,
>> RichValueJoiner,
>>>> RichInitializer etc. Why can't they not just be interfaces? Generally we
>>>> should provide users with Intefaces to code against, not classes.
>>>>
>>>> Thanks,
>>>> Damian
>>>>
>>>> On Fri, 19 May 2017 at 00:50 Jeyhun Karimov <[email protected]>
>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Thanks. I initiated the PR as well, to get a better overview.
>>>>>
>>>>> The only reason that I used abstract class instead of interface for
>> Rich
>>>>> functions is that in future if we will have some AbstractRichFunction
>>>>> abstract classes,
>>>>> we can easily extend:
>>>>>
>>>>> public abstract class RichValueMapper<K, V, VR>  implements
>>>>> ValueMapperWithKey<K, V, VR>, RichFunction *extends
>>>> AbstractRichFunction*{
>>>>> }
>>>>>  With interfaces we are only limited to interfaces for inheritance.
>>>>>
>>>>> However, I think we can easily change it (from abstract class ->
>>>> interface)
>>>>> if you think interface is a better fit.
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Jeyhun
>>>>>
>>>>>
>>>>> On Fri, May 19, 2017 at 1:00 AM Matthias J. Sax <[email protected]
>>>
>>>>> wrote:
>>>>>
>>>>>> Thanks for the update and explanations. The KIP is quite improved now
>>>> --
>>>>>> great job!
>>>>>>
>>>>>> One more question: Why are RichValueMapper etc abstract classes and
>> not
>>>>>> interfaces?
>>>>>>
>>>>>>
>>>>>> Overall, I like the KIP a lot!
>>>>>>
>>>>>>
>>>>>> -Matthias
>>>>>>
>>>>>>
>>>>>> On 5/16/17 7:03 AM, Jeyhun Karimov wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Thanks for your comments.
>>>>>>>
>>>>>>> I think supporting Lambdas for `withKey` and `AbstractRichFunction`
>>>>>>>> don't go together, as Lambdas are only supported for interfaces
>>>> AFAIK.
>>>>>>>
>>>>>>>
>>>>>>> Maybe I misunderstood your comment.
>>>>>>> *withKey* and and *withOnlyValue* are interfaces. So they don't have
>>>>>> direct
>>>>>>> relation with *AbstractRichFunction*.
>>>>>>> *withKey* and and *withOnlyValue* interfaces have only one  method ,
>>>> so
>>>>>> we
>>>>>>> can use lambdas.
>>>>>>> Where does the *AbstractRichFunction* comes to play? Inside Rich
>>>>>> functions.
>>>>>>> And we use Rich functions in 2 places:
>>>>>>>
>>>>>>> 1. User doesn't use rich functions. Just regular *withKey* and and
>>>>>>> *withOnlyValue* interfaces(both support lambdas) . We get those
>>>>>> interfaces
>>>>>>> and wrap into Rich function while building the topology, and send to
>>>>>>> Processor.
>>>>>>> 2. User does use rich functions (Rich functions implement *withKey*
>>>>>>> interface). As a result no lamdas here by definition. In this case,
>>>>> while
>>>>>>> building the topology we do a type check if the object type is
>>>>>>> *withKey* or *RichFunction*.
>>>>>>>
>>>>>>> So *AbstractRichFunction* is just syntactic sugar for Rich functions
>>>>> and
>>>>>>> does not affect using lambdas.
>>>>>>>
>>>>>>> Thus, if we want to support Lambdas for `withKey`, we need to have a
>>>>>>>> interface approach like this
>>>>>>>>   - RichFunction -> only adding init() and close()
>>>>>>>>   - ValueMapper
>>>>>>>>   - ValueMapperWithKey
>>>>>>>>   - RichValueMapper extends ValueMapperWithKey, RichFunction
>>>>>>>
>>>>>>>
>>>>>>> As I said above, currently we support lambdas for *withKey*
>>>> interfaces
>>>>> as
>>>>>>> well.  However, I agree with your idea and I will remove the
>>>>>>> AbstractRichFunction from the design.
>>>>>>>
>>>>>>> As an alternative, we could argue, that it is sufficient to support
>>>>>>>> Lambdas for the "plain" API only, but not for any "extended API".
>>>> For
>>>>>>>> this, RichFunction could add key+init+close and AbstractRichFunction
>>>>>>>> would allow to only care about getting the key.
>>>>>>>> Not sure, which one is better. I don't like the idea of more
>>>>> overloaded
>>>>>>>> methods to get Lambdas for `withKey` interfaces too much because we
>>>>> have
>>>>>>>> already so many overlaods. On the other hand, I do see value in
>>>>>>>> supporting Lambdas for `withKey`.
>>>>>>>
>>>>>>>
>>>>>>> Just to clarify, with current design we have only one extra
>>>> overloaded
>>>>>>> method per *withOnlyValue* interface:  which is *withKey* version of
>>>>>>> particular interface.
>>>>>>> We don't need extra overload for Rich function as Rich function
>>>>>> implements
>>>>>>> *withKey* interface as a result they have same type. We differentiate
>>>>>> them
>>>>>>> while building the topology.
>>>>>>> We supported lambdas for *withKey* APIs because of the comment:
>>>>>>>
>>>>>>> @Jeyhun: I did not put any thought into this, but can we have a
>>>> design
>>>>>>>> that allows for both? Also, with regard to lambdas, it might make
>>>>> sense
>>>>>>>> to allow for both `V -> newV` and `(K, V) -> newV` ?
>>>>>>>
>>>>>>>
>>>>>>> However, I don't think that this complicates the overall design
>>>>>>> significantly.
>>>>>>>
>>>>>>>
>>>>>>> Depending on what we want to support, it might make sense to
>>>>>>>> include/exclude RichFunctions from this KIP -- and thus, this also
>>>>>>>> determines if we should have a "ProcessorContext KIP" before driving
>>>>>>>> this KIP further.
>>>>>>>
>>>>>>>
>>>>>>> Based on our discussion I think we should keep Rich functions as I
>>>>> don't
>>>>>>> think that they bring extra layer of overhead to library.
>>>>>>>
>>>>>>> Any comments are appreciated.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Jeyhun
>>>>>>>
>>>>>>>
>>>>>>> On Tue, May 16, 2017 at 12:10 AM Matthias J. Sax <
>>>>> [email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Jeyhun,
>>>>>>>>
>>>>>>>> thanks for the update.
>>>>>>>>
>>>>>>>> I think supporting Lambdas for `withKey` and `AbstractRichFunction`
>>>>>>>> don't go together, as Lambdas are only supported for interfaces
>>>> AFAIK.
>>>>>>>>
>>>>>>>> Thus, if we want to support Lambdas for `withKey`, we need to have a
>>>>>>>> interface approach like this
>>>>>>>>
>>>>>>>>   - RichFunction -> only adding init() and close()
>>>>>>>>
>>>>>>>>   - ValueMapper
>>>>>>>>   - ValueMapperWithKey
>>>>>>>>
>>>>>>>>   - RichValueMapper extends ValueMapperWithKey, RichFunction
>>>>>>>>
>>>>>>>> For this approach, AbstractRichFunction does not make sense anymore,
>>>>> as
>>>>>>>> the only purpose of `RichFunction` is to allow the implementation of
>>>>>>>> init() and close() -- if you don't want those, you would implement a
>>>>>>>> different interface (ie, ValueMapperWithKey)
>>>>>>>>
>>>>>>>> As an alternative, we could argue, that it is sufficient to support
>>>>>>>> Lambdas for the "plain" API only, but not for any "extended API".
>>>> For
>>>>>>>> this, RichFunction could add key+init+close and AbstractRichFunction
>>>>>>>> would allow to only care about getting the key.
>>>>>>>>
>>>>>>>> Not sure, which one is better. I don't like the idea of more
>>>>> overloaded
>>>>>>>> methods to get Lambdas for `withKey` interfaces too much because we
>>>>> have
>>>>>>>> already so many overlaods. On the other hand, I do see value in
>>>>>>>> supporting Lambdas for `withKey`.
>>>>>>>>
>>>>>>>> Depending on what we want to support, it might make sense to
>>>>>>>> include/exclude RichFunctions from this KIP -- and thus, this also
>>>>>>>> determines if we should have a "ProcessorContext KIP" before driving
>>>>>>>> this KIP further.
>>>>>>>>
>>>>>>>> Thoughts?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -Matthias
>>>>>>>>
>>>>>>>>
>>>>>>>> On 5/15/17 11:01 AM, Jeyhun Karimov wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Sorry for super late response. Thanks for your comments.
>>>>>>>>>
>>>>>>>>> I am not an expert on Lambdas. Can you elaborate a little bit? I
>>>>> cannot
>>>>>>>>>> follow the explanation in the KIP to see what the problem is.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> - From [1] says "A functional interface is an interface that has
>>>> just
>>>>>> one
>>>>>>>>> abstract method, and thus represents a single function contract".
>>>>>>>>> So basically once we extend some interface from another (in our
>>>> case,
>>>>>>>>> ValueMapperWithKey from ValueMapper) we cannot use lambdas in the
>>>>>>>> extended
>>>>>>>>> interface.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Further comments:
>>>>>>>>>>  - The KIP get a little hard to read -- can you maybe reformat the
>>>>>> wiki
>>>>>>>>>> page a little bit? I think using `CodeBlock` would help.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> - I will work on the KIP.
>>>>>>>>>
>>>>>>>>>  - What about KStream-KTable joins? You don't have overlaods added
>>>>> for
>>>>>>>>>> them. Why? (Even if I still hope that we don't need to add any new
>>>>>>>>>> overloads)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> - Actually there are more than one Processor and public APIs to be
>>>>>>>>> changed (KStream-KTable
>>>>>>>>> joins is one case). However all of them has similar structure: we
>>>>>>>> overload
>>>>>>>>> the *method* with  *methodWithKey*,
>>>>>>>>> wrap it into the Rich function, send to processor and inside the
>>>>>>>> processor
>>>>>>>>> call *init* and *close* methods of the Rich function.
>>>>>>>>> As I wrote in KIP, I wanted to demonstrate the overall idea with
>>>> only
>>>>>>>>> *ValueMapper* as the same can be applied to all changes.
>>>>>>>>> Anyway I will update the KIP.
>>>>>>>>>
>>>>>>>>>  - Why do we need `AbstractRichFunction`?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Instead of overriding the *init(ProcessorContext p)* and* close()*
>>>>>>>> methods
>>>>>>>>> in every Rich function with empty body like:
>>>>>>>>>
>>>>>>>>> @Override
>>>>>>>>> void init(ProcessorContext context) {}
>>>>>>>>>
>>>>>>>>> @Override
>>>>>>>>> void close () {}
>>>>>>>>>
>>>>>>>>> I thought that we can override them once in *AbstractRichFunction*
>>>>> and
>>>>>>>>> extent new Rich functions from *AbstractRichFunction*.
>>>>>>>>> Basically this can eliminate code copy-paste and ease the
>>>>> maintenance.
>>>>>>>>>
>>>>>>>>>  - What about interfaces Initializer, ForeachAction, Merger,
>>>>> Predicate,
>>>>>>>>>> Reducer? I don't want to say we should/need to add to all, but we
>>>>>> should
>>>>>>>>>> discuss all of them and add where it does make sense (e.g.,
>>>>>>>>>> RichForachAction does make sense IMHO)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Definitely agree. As I said, the same technique applies to all this
>>>>>>>>> interfaces and I didn't want to explode the KIP, just wanted to
>>>> give
>>>>>> the
>>>>>>>>> overall intuition.
>>>>>>>>> However, I will update the KIP as I said.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Btw: I like the hierarchy `ValueXX` -- `ValueXXWithKey` --
>>>>>> `RichValueXX`
>>>>>>>>>> in general -- but why can't we do all this with interfaces only?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Sure we can. However the main intuition is we should not force
>>>> users
>>>>> to
>>>>>>>>> implement *init(ProcessorContext)* and *close()* functions every
>>>> time
>>>>>>>> they
>>>>>>>>> use Rich functions.
>>>>>>>>> If one needs, she can override the respective methods. However, I
>>>> am
>>>>>> open
>>>>>>>>> for discussion.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I'd rather not see the use of  `ProcessorContext` spread any
>>>> further
>>>>>> than
>>>>>>>>>> it currently is. So maybe we need another KIP that is done before
>>>>>> this?
>>>>>>>>>> Otherwise i think the scope of this KIP is becoming too large.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> That is good point. I wanted to make *init(ProcessorContext)*
>>>> method
>>>>>>>>> persistent among the library (which use ProcessorContext as an
>>>>> input),
>>>>>>>>> therefore I put *ProcessorContext* as an input.
>>>>>>>>> So the important question is that (as @dguy and @mjsax mentioned)
>>>>>> whether
>>>>>>>>> continue this KIP without providing users an access to
>>>>>> *ProcessorContext*
>>>>>>>>> (change *init (ProcessorContext)* to * init()* ) or
>>>>>>>>> initiate another KIP before this.
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>> http://cr.openjdk.java.net/~mr/se/8/java-se-8-pfd-spec/java-
>>>>>>>> se-8-jls-pfd-diffs.pdf
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Jeyhun
>>>>>>>>>
>>>>>>>>> On Mon, May 15, 2017 at 7:15 PM, Damian Guy <[email protected]>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I'd rather not see the use of  `ProcessorContext` spread any
>>>> further
>>>>>>>> than
>>>>>>>>>> it currently is. So maybe we need another KIP that is done before
>>>>>> this?
>>>>>>>>>> Otherwise i think the scope of this KIP is becoming too large.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, 15 May 2017 at 18:06 Matthias J. Sax <
>>>> [email protected]
>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I agree that that `ProcessorContext` interface is too broad in
>>>>>> general
>>>>>>>>>>> -- this is even true for transform/process, and it's also
>>>> reflected
>>>>>> in
>>>>>>>>>>> the API improvement list we want to do.
>>>>>>>>>>>
>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/
>>>>>>>>>> Kafka+Streams+Discussions
>>>>>>>>>>>
>>>>>>>>>>> So I am wondering, if you question the `RichFunction` approach in
>>>>>>>>>>> general? Or if you suggest to either extend the scope of this KIP
>>>>> to
>>>>>>>>>>> include this---or maybe better, do another KIP for it and delay
>>>>> this
>>>>>>>> KIP
>>>>>>>>>>> until the other one is done?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -Matthias
>>>>>>>>>>>
>>>>>>>>>>> On 5/15/17 2:35 AM, Damian Guy wrote:
>>>>>>>>>>>> Thanks for the KIP.
>>>>>>>>>>>>
>>>>>>>>>>>> I'm not convinced on the `RichFunction` approach. Do we really
>>>>> want
>>>>>> to
>>>>>>>>>>> give
>>>>>>>>>>>> every DSL method access to the `ProcessorContext` ? It has a
>>>> bunch
>>>>>> of
>>>>>>>>>>>> methods on it that seem in-appropriate for some of the DSL
>>>>> methods,
>>>>>>>>>> i.e,
>>>>>>>>>>>> `register`, `getStateStore`, `forward`, `schedule` etc. It is
>>>> far
>>>>>> too
>>>>>>>>>>>> broad. I think it would be better to have a narrower interface
>>>>> like
>>>>>>>> the
>>>>>>>>>>>> `RecordContext`  - remembering it is easier to add
>>>>>> methods/interfaces
>>>>>>>>>>> later
>>>>>>>>>>>> than to remove them
>>>>>>>>>>>>
>>>>>>>>>>>> On Sat, 13 May 2017 at 22:26 Matthias J. Sax <
>>>>> [email protected]
>>>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Jeyhun,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am not an expert on Lambdas. Can you elaborate a little bit?
>>>> I
>>>>>>>>>> cannot
>>>>>>>>>>>>> follow the explanation in the KIP to see what the problem is.
>>>>>>>>>>>>>
>>>>>>>>>>>>> For updating the KIP title I don't know -- guess it's up to
>>>> you.
>>>>>>>>>> Maybe a
>>>>>>>>>>>>> committer can comment on this?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Further comments:
>>>>>>>>>>>>>
>>>>>>>>>>>>>  - The KIP get a little hard to read -- can you maybe reformat
>>>>> the
>>>>>>>>>> wiki
>>>>>>>>>>>>> page a little bit? I think using `CodeBlock` would help.
>>>>>>>>>>>>>
>>>>>>>>>>>>>  - What about KStream-KTable joins? You don't have overlaods
>>>>> added
>>>>>>>> for
>>>>>>>>>>>>> them. Why? (Even if I still hope that we don't need to add any
>>>>> new
>>>>>>>>>>>>> overloads)
>>>>>>>>>>>>>
>>>>>>>>>>>>>  - Why do we need `AbstractRichFunction`?
>>>>>>>>>>>>>
>>>>>>>>>>>>>  - What about interfaces Initializer, ForeachAction, Merger,
>>>>>>>>>> Predicate,
>>>>>>>>>>>>> Reducer? I don't want to say we should/need to add to all, but
>>>> we
>>>>>>>>>> should
>>>>>>>>>>>>> discuss all of them and add where it does make sense (e.g.,
>>>>>>>>>>>>> RichForachAction does make sense IMHO)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Btw: I like the hierarchy `ValueXX` -- `ValueXXWithKey` --
>>>>>>>>>> `RichValueXX`
>>>>>>>>>>>>> in general -- but why can't we do all this with interfaces
>>>> only?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 5/11/17 7:00 AM, Jeyhun Karimov wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for your comments. I think we cannot extend the two
>>>>>>>> interfaces
>>>>>>>>>>> if
>>>>>>>>>>>>> we
>>>>>>>>>>>>>> want to keep lambdas. I updated the KIP [1]. Maybe I should
>>>>> change
>>>>>>>>>> the
>>>>>>>>>>>>>> title, because now we are not limiting the KIP to only
>>>>>> ValueMapper,
>>>>>>>>>>>>>> ValueTransformer and ValueJoiner.
>>>>>>>>>>>>>> Please feel free to comment.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>> 149%3A+Enabling+key+access+in+ValueTransformer%2C+
>>>>>>>>>> ValueMapper%2C+and+ValueJoiner
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>> Jeyhun
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, May 9, 2017 at 7:36 PM Matthias J. Sax <
>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If `ValueMapperWithKey` extends `ValueMapper` we don't need
>>>> the
>>>>>> new
>>>>>>>>>>>>>>> overlaod.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> And yes, we need to do one check -- but this happens when
>>>>>> building
>>>>>>>>>> the
>>>>>>>>>>>>>>> topology. At runtime (I mean after KafkaStream#start() we
>>>> don't
>>>>>>>> need
>>>>>>>>>>> any
>>>>>>>>>>>>>>> check as we will always use `ValueMapperWithKey`)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 5/9/17 2:55 AM, Jeyhun Karimov wrote:
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for feedback.
>>>>>>>>>>>>>>>> Then we need to overload method
>>>>>>>>>>>>>>>>   <VR> KStream<K, VR> mapValues(ValueMapper<? super V, ?
>>>>> extends
>>>>>>>>>> VR>
>>>>>>>>>>>>>>>> mapper);
>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>   <VR> KStream<K, VR> mapValues(ValueMapperWithKey<? super
>>>> V,
>>>>> ?
>>>>>>>>>>> extends
>>>>>>>>>>>>>>> VR>
>>>>>>>>>>>>>>>> mapper);
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> and in runtime (inside processor) we still have to check it
>>>> is
>>>>>>>>>>>>>>> ValueMapper
>>>>>>>>>>>>>>>> or ValueMapperWithKey before wrapping it into the rich
>>>>> function.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Please correct me if I am wrong.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>> Jeyhun
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, May 9, 2017 at 10:56 AM Michal Borowiecki <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> +1 :)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 08/05/17 23:52, Matthias J. Sax wrote:
>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I was reading the updated KIP and I am wondering, if we
>>>>> should
>>>>>>>> do
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> design a little different.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Instead of distinguishing between a RichFunction and
>>>>>>>>>>> non-RichFunction
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>> runtime level, we would use RichFunctions all the time.
>>>>> Thus,
>>>>>> on
>>>>>>>>>>> the
>>>>>>>>>>>>>>> DSL
>>>>>>>>>>>>>>>>>> entry level, if a user provides a non-RichFunction, we
>>>> wrap
>>>>> it
>>>>>>>>>> by a
>>>>>>>>>>>>>>>>>> RichFunction that is fully implemented by Streams. This
>>>>>>>>>>> RichFunction
>>>>>>>>>>>>>>>>>> would just forward the call omitting the key:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Just to sketch the idea (incomplete code snippet):
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> public StreamsRichValueMapper implements
>>>> RichValueMapper()
>>>>> {
>>>>>>>>>>>>>>>>>>>    private ValueMapper userProvidedMapper; // set by
>>>>>>>> constructor
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    public VR apply(final K key, final V1 value1, final V2
>>>>>>>>>> value2)
>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>        return userProvidedMapper(value1, value2);
>>>>>>>>>>>>>>>>>>>    }
>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  From a performance point of view, I am not sure if the
>>>>>>>>>>>>>>>>>> "if(isRichFunction)" including casts etc would have more
>>>>>>>> overhead
>>>>>>>>>>>>> than
>>>>>>>>>>>>>>>>>> this approach (we would do more nested method call for
>>>>>>>>>>>>> non-RichFunction
>>>>>>>>>>>>>>>>>> which should be more common than RichFunctions).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This approach should not effect lambdas (or do I miss
>>>>>>>> something?)
>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> might be cleaner, as we could have one more top level
>>>>>> interface
>>>>>>>>>>>>>>>>>> `RichFunction` with methods `init()` and `close()` and
>>>> also
>>>>>>>>>>>>> interfaces
>>>>>>>>>>>>>>>>>> for `RichValueMapper` etc. (thus, no abstract classes
>>>>>> required).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Any thoughts?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 5/6/17 5:29 PM, Jeyhun Karimov wrote:
>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks for comments. I extended PR and KIP to include
>>>> rich
>>>>>>>>>>>>> functions.
>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>> will still have to evaluate the cost of deep copying of
>>>>> keys.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>> Jeyhun
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Fri, May 5, 2017 at 8:02 PM Mathieu Fenniak <
>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hey Matthias,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> My opinion would be that documenting the immutability of
>>>>> the
>>>>>>>>>> key
>>>>>>>>>>> is
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> best approach available.  Better than requiring the key
>>>> to
>>>>>> be
>>>>>>>>>>>>>>>>> serializable
>>>>>>>>>>>>>>>>>>>> (as with Jeyhun's last pass at the PR), no performance
>>>>> risk.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> It'd be different if Java had immutable type constraints
>>>>> of
>>>>>>>>>> some
>>>>>>>>>>>>>>> kind.
>>>>>>>>>>>>>>>>> :-)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Mathieu
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Fri, May 5, 2017 at 11:31 AM, Matthias J. Sax <
>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Agreed about RichFunction. If we follow this path, it
>>>>>> should
>>>>>>>>>>> cover
>>>>>>>>>>>>>>>>>>>>> all(?) DSL interfaces.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> About guarding the key -- I am still not sure what to
>>>> do
>>>>>>>> about
>>>>>>>>>>>>> it...
>>>>>>>>>>>>>>>>>>>>> Maybe it might be enough to document it (and name the
>>>> key
>>>>>>>>>>>>> parameter
>>>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>>>>>>> `readOnlyKey` to make is very clear). Ultimately, I
>>>> would
>>>>>>>>>> prefer
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> guard against any modification, but I have no good idea
>>>>> how
>>>>>>>> to
>>>>>>>>>>> do
>>>>>>>>>>>>>>>>> this.
>>>>>>>>>>>>>>>>>>>>> Not sure what others think about the risk of corrupted
>>>>>>>>>>>>> partitioning
>>>>>>>>>>>>>>>>>>>>> (what would be a user error and we could say, well, bad
>>>>>> luck,
>>>>>>>>>>> you
>>>>>>>>>>>>>>> got
>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>> bug in your code, that's not our fault), vs deep copy
>>>>> with
>>>>>> a
>>>>>>>>>>>>>>> potential
>>>>>>>>>>>>>>>>>>>>> performance hit (that we can't quantity atm without any
>>>>>>>>>>>>> performance
>>>>>>>>>>>>>>>>>>>> test).
>>>>>>>>>>>>>>>>>>>>> We do have a performance system test. Maybe it's worth
>>>>> for
>>>>>>>> you
>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> apply
>>>>>>>>>>>>>>>>>>>>> the deep copy strategy and run the test. It's very
>>>> basic
>>>>>>>>>>>>> performance
>>>>>>>>>>>>>>>>>>>>> test only, but might give some insight. If you want to
>>>> do
>>>>>>>>>> this,
>>>>>>>>>>>>> look
>>>>>>>>>>>>>>>>>>>>> into folder "tests" for general test setup, and into
>>>>>>>>>>>>>>>>>>>>> "tests/kafaktests/benchmarks/streams" to find find the
>>>>> perf
>>>>>>>>>>> test.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On 5/5/17 8:55 AM, Jeyhun Karimov wrote:
>>>>>>>>>>>>>>>>>>>>>> Hi Matthias,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I think extending KIP to include RichFunctions totally
>>>>>>>> makes
>>>>>>>>>>>>>>> sense.
>>>>>>>>>>>>>>>>>>>> So,
>>>>>>>>>>>>>>>>>>>>>>   we don't want to guard the keys because it is
>>>> costly.
>>>>>>>>>>>>>>>>>>>>>> If we introduce RichFunctions I think it should not be
>>>>>>>>>> limited
>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>>>>> the 3
>>>>>>>>>>>>>>>>>>>>>> interfaces proposed in KIP but to wide range of
>>>>>> interfaces.
>>>>>>>>>>>>>>>>>>>>>> Please correct me if I am wrong.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>> Jeyhun
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Fri, May 5, 2017 at 12:04 AM Matthias J. Sax <
>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> One follow up. There was this email on the user list:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>> http://search-hadoop.com/m/Kafka/uyzND17KhCaBzPSZ1?subj=
>>>>>>>>>>>>>>>>>>>>>
>>>>> Shouldn+t+the+initializer+of+a+stream+aggregate+accept+the+
>>>>>>>>>> key+
>>>>>>>>>>>>>>>>>>>>>>> It might make sense so include Initializer, Adder,
>>>> and
>>>>>>>>>>>>> Substractor
>>>>>>>>>>>>>>>>>>>>>>> inferface, too.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> And we should double check if there are other
>>>> interface
>>>>>> we
>>>>>>>>>>> might
>>>>>>>>>>>>>>>>> miss
>>>>>>>>>>>>>>>>>>>>> atm.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On 5/4/17 1:31 PM, Matthias J. Sax wrote:
>>>>>>>>>>>>>>>>>>>>>>>> Thanks for updating the KIP.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Deep copying the key will work for sure, but I am
>>>>>> actually
>>>>>>>>>> a
>>>>>>>>>>>>>>> little
>>>>>>>>>>>>>>>>>>>> bit
>>>>>>>>>>>>>>>>>>>>>>>> worried about performance impact... We might want to
>>>>> do
>>>>>>>>>> some
>>>>>>>>>>>>> test
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>> quantify this impact.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Btw: this remind me about the idea of `RichFunction`
>>>>>>>>>>> interface
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>> would allow users to access record metadata (like
>>>>>>>>>> timestamp,
>>>>>>>>>>>>>>>>> offset,
>>>>>>>>>>>>>>>>>>>>>>>> partition etc) within DSL. This would be a similar
>>>>>>>> concept.
>>>>>>>>>>>>>>> Thus, I
>>>>>>>>>>>>>>>>>>>> am
>>>>>>>>>>>>>>>>>>>>>>>> wondering, if it would make sense to enlarge the
>>>> scope
>>>>>> of
>>>>>>>>>>> this
>>>>>>>>>>>>>>> KIP
>>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>>>>>>> that? WDYT?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On 5/3/17 2:08 AM, Jeyhun Karimov wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> Hi Mathieu,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for feedback. I followed similar approach
>>>> and
>>>>>>>>>> updated
>>>>>>>>>>>>> PR
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>> KIP
>>>>>>>>>>>>>>>>>>>>>>>>> accordingly. I tried to guard the key in Processors
>>>>>>>>>> sending
>>>>>>>>>>> a
>>>>>>>>>>>>>>> copy
>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>> actual key.
>>>>>>>>>>>>>>>>>>>>>>>>> Because I am doing deep copy of an object, I think
>>>>>> memory
>>>>>>>>>>> can
>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>> bottleneck
>>>>>>>>>>>>>>>>>>>>>>>>> in some use-cases.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>>>> Jeyhun
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, May 2, 2017 at 5:10 PM Mathieu Fenniak <
>>>>>>>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jeyhun,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> This approach would change ValueMapper (...etc) to
>>>>> be
>>>>>>>>>>>>> classes,
>>>>>>>>>>>>>>>>>>>> rather
>>>>>>>>>>>>>>>>>>>>>>> than
>>>>>>>>>>>>>>>>>>>>>>>>>> interfaces, which is also a backwards incompatible
>>>>>>>>>> change.
>>>>>>>>>>>>> An
>>>>>>>>>>>>>>>>>>>>>>> alternative
>>>>>>>>>>>>>>>>>>>>>>>>>> approach that would be backwards compatible would
>>>> be
>>>>>> to
>>>>>>>>>>>>> define
>>>>>>>>>>>>>>>>> new
>>>>>>>>>>>>>>>>>>>>>>>>>> interfaces, and provide overrides where those
>>>>>> interfaces
>>>>>>>>>>> are
>>>>>>>>>>>>>>>>> used.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Unfortunately, making the key parameter as "final"
>>>>>>>>>> doesn't
>>>>>>>>>>>>>>> change
>>>>>>>>>>>>>>>>>>>>> much
>>>>>>>>>>>>>>>>>>>>>>>>>> about guarding against key change.  It only
>>>> prevents
>>>>>> the
>>>>>>>>>>>>>>>>> parameter
>>>>>>>>>>>>>>>>>>>>>>> variable
>>>>>>>>>>>>>>>>>>>>>>>>>> from being reassigned.  If the key type is a
>>>> mutable
>>>>>>>>>> object
>>>>>>>>>>>>>>> (eg.
>>>>>>>>>>>>>>>>>>>>>>> byte[]),
>>>>>>>>>>>>>>>>>>>>>>>>>> it can still be mutated. (eg. key[0] = 0).  But
>>>> I'm
>>>>>> not
>>>>>>>>>>>>> really
>>>>>>>>>>>>>>>>> sure
>>>>>>>>>>>>>>>>>>>>>>> there's
>>>>>>>>>>>>>>>>>>>>>>>>>> much that can be done about that.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Mathieu
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, May 1, 2017 at 5:39 PM, Jeyhun Karimov <
>>>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for comments.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> The concerns makes sense. Although we can guard
>>>> for
>>>>>>>>>>>>> immutable
>>>>>>>>>>>>>>>>> keys
>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>> current implementation (with few changes), I
>>>> didn't
>>>>>>>>>>> consider
>>>>>>>>>>>>>>>>>>>>> backward
>>>>>>>>>>>>>>>>>>>>>>>>>>> compatibility.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> In this case 2 solutions come to my mind. In both
>>>>>>>> cases,
>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>>>>>> accesses
>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>> key in Object type, as passing extra type
>>>> parameter
>>>>>>>> will
>>>>>>>>>>>>> break
>>>>>>>>>>>>>>>>>>>>>>>>>>> backwards-compatibility.  So user has to cast to
>>>>>> actual
>>>>>>>>>>> key
>>>>>>>>>>>>>>>>> type.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. Firstly, We can overload apply method with 2
>>>>>>>> argument
>>>>>>>>>>>>> (key
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>> value)
>>>>>>>>>>>>>>>>>>>>>>>>>>> and force key to be *final*. By doing this,  I
>>>>> think
>>>>>> we
>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>> address
>>>>>>>>>>>>>>>>>>>>>>> both
>>>>>>>>>>>>>>>>>>>>>>>>>>> backward-compatibility and guarding against key
>>>>>> change.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Secondly, we can create class KeyAccess like:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> public class KeyAccess {
>>>>>>>>>>>>>>>>>>>>>>>>>>>      Object key;
>>>>>>>>>>>>>>>>>>>>>>>>>>>      public void beforeApply(final Object key) {
>>>>>>>>>>>>>>>>>>>>>>>>>>>          this.key = key;
>>>>>>>>>>>>>>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>>>>>>>>>>>>>>>      public Object getKey() {
>>>>>>>>>>>>>>>>>>>>>>>>>>>          return key;
>>>>>>>>>>>>>>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> We can extend *ValueMapper, ValueJoiner* and
>>>>>>>>>>>>>>> *ValueTransformer*
>>>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>>>>>>>>> *KeyAccess*. Inside processor (for example
>>>>>>>>>>>>>>>>>>>>> *KTableMapValuesProcessor*)
>>>>>>>>>>>>>>>>>>>>>>>>>>> before calling *mapper.apply(value)* we can set
>>>> the
>>>>>>>>>> *key*
>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>>>>>>>>>> *mapper.beforeApply(key)*. As a result, user can
>>>>> use
>>>>>>>>>>>>>>> *getKey()*
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>> access
>>>>>>>>>>>>>>>>>>>>>>>>>>> the key inside *apply(value)* method.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeyhun
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, May 1, 2017 at 7:24 PM Matthias J. Sax <
>>>>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeyhun,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> thanks a lot for the KIP!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think there are two issues we need to address:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> (1) The KIP does not consider backward
>>>>>> compatibility.
>>>>>>>>>>> Users
>>>>>>>>>>>>>>> did
>>>>>>>>>>>>>>>>>>>>>>>>>> complain
>>>>>>>>>>>>>>>>>>>>>>>>>>>> about this in past releases already, and as the
>>>>> user
>>>>>>>>>> base
>>>>>>>>>>>>>>>>> grows,
>>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>>>>>>> should not break backward compatibility in
>>>> future
>>>>>>>>>>> releases
>>>>>>>>>>>>>>>>>>>> anymore.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thus, we should think of a better way to allow
>>>> key
>>>>>>>>>>> access.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Mathieu's comment goes into the same direction
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On the other hand, the number of compile
>>>>> failures
>>>>>>>>>> that
>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fixed from this change is unfortunate. :-)
>>>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Another concern is, that there is no guard
>>>> to
>>>>>>>>>> prevent
>>>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>> modify the key. This might corrupt partitioning
>>>> if
>>>>>>>>>> users
>>>>>>>>>>> do
>>>>>>>>>>>>>>>>> alter
>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> key (accidentally -- or users are just not aware
>>>>>> that
>>>>>>>>>>> they
>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>>> allowed to modify the provided key object) and
>>>>> thus
>>>>>>>>>> break
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> application. (This was the original motivation
>>>> to
>>>>>> not
>>>>>>>>>>>>> provide
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> key
>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the first place -- it's guards against
>>>>>> modification.)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 5/1/17 6:31 AM, Mathieu Fenniak wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jeyhun,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I just want to add my voice that, I too, have
>>>>>> wished
>>>>>>>>>> for
>>>>>>>>>>>>>>>>> access
>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> record key during a mapValues or similar
>>>>> operation.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On the other hand, the number of compile
>>>> failures
>>>>>>>> that
>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fixed from this change is unfortunate. :-)  But
>>>>> at
>>>>>>>>>> least
>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pretty clear and easy change.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Mathieu
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, May 1, 2017 at 6:55 AM, Jeyhun Karimov
>>>> <
>>>>>>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Dear community,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I want to share KIP-149 [1] based on issues
>>>>>>>>>> KAFKA-4218
>>>>>>>>>>>>> [2],
>>>>>>>>>>>>>>>>>>>>>>>>>> KAFKA-4726
>>>>>>>>>>>>>>>>>>>>>>>>>>>> [3],
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> KAFKA-3745 [4]. The related PR can be found at
>>>>>> [5].
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would like to get your comments.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://cwiki.apache.org/
>>>>>>>>>> confluence/display/KAFKA/KIP-
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>> 149%3A+Enabling+key+access+in+ValueTransformer%2C+
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ValueMapper%2C+and+ValueJoiner
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2] https://issues.apache.org/jira
>>>>>>>> /browse/KAFKA-4218
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [3] https://issues.apache.org/jira
>>>>>>>> /browse/KAFKA-4726
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [4] https://issues.apache.org/jira
>>>>>>>> /browse/KAFKA-3745
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [5] https://github.com/apache/kafka/pull/2946
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeyhun
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -Cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeyhun
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>> -Cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeyhun
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>> -Cheers
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Jeyhun
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> -Cheers
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Jeyhun
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> -Cheers
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Jeyhun
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>> -Cheers
>>>>>
>>>>> Jeyhun
>>>>>
>>>>
>>
>> --
> -Cheers
> 
> Jeyhun
>

signature.asc
Description: OpenPGP digital signature

Re: [DISCUSS]: KIP-149: Enabling key access in ValueTransformer, ValueMapper, and ValueJoiner

Reply via email to