I’m also for approach #1. Now is nice time to apply some API-breaking changes.

> On Jan 14, 2016, at 1:19 AM, Aljoscha Krettek <aljos...@apache.org> wrote:
> 
> I’m also for Approach #1. I like simplifying things.
>> On 13 Jan 2016, at 14:25, Vasiliki Kalavri <vasilikikala...@gmail.com> wrote:
>> 
>> Hi,
>> 
>> ​+1 for removing the Combinable annotation​. Approach 1 sounds like the
>> best option to me.
>> 
>> 
>> On 13 January 2016 at 14:11, Till Rohrmann <trohrm...@apache.org> wrote:
>> 
>>> Hi Fabian,
>>> 
>>> thanks for bringing this issue up. I agree with you that it would be nice
>>> to remove the Combinable annotation if it is not really needed. Now if
>>> people are not aware of the Combinable interface then they might think that
>>> they are actually using combiners by simply implementing CombineFunction.
>>> 
>>> 
>> ​has happened to me :S​
>> 
>> 
>> 
>>> I would also be in favour of approach 1 combined with a migration guide
>>> where we highlight possible problems with a faulty combine implementation.
>>> 
>>> 
>> Migration guide is a ​good idea​!
>> 
>> 
>> 
>>> Cheers,
>>> Till
>>> ​
>>> 
>>> 
>> ​-Vasia.​
>> 
>> 
>> 
>>> On Wed, Jan 13, 2016 at 1:53 PM, Fabian Hueske <fhue...@gmail.com> wrote:
>>> 
>>>> Hi everybody,
>>>> 
>>>> 
>>>> 
>>>> As part of the upcoming 1.0 release we are stabilizing Flink's APIs.
>>>> 
>>>> I would like to start a discussion about removing the Combinable
>>> annotation
>>>> from the DataSet API.
>>>> 
>>>> 
>>>> 
>>>> # The Current State:
>>>> 
>>>> The DataSet API offers two interface for combine functions,
>>> CombineFunction
>>>> and GroupCombineFunction, which can be added to a GroupReduceFunctions
>>>> (GRF).
>>>> 
>>>> 
>>>> However, implementing a combine interface does not make a GRF combinable.
>>>> In addition, the GRF class needs to be annotated with the Combinable
>>>> annotation. The RichGroupReduceFunction has a default implementation of
>>>> combine, which forwards the combine parameters to the reduce method. This
>>>> default implementation is not used, unless the class that extends RGRF
>>> has
>>>> the Combinable annotation.
>>>> 
>>>> 
>>>> 
>>>> In addition to the Combinable annotation, the DataSet API
>>>> GroupReduceOperator features the setCombinable(boolean) method to
>>> override
>>>> a missing or existing Combinable annotation.
>>>> 
>>>> 
>>>> 
>>>> # My Proposal:
>>>> 
>>>> I propose to remove the Combinable annotation because it is not required
>>>> and complicates the definition of combiners. Instead, the combine method
>>> of
>>>> a GroupReduceFunction should be executed if it implements one of the
>>>> combine function interfaces. This would require to remove the default
>>>> combine implementation of the RichGroupReduceFunction as well.
>>>> 
>>>> This would be an API breaking change and has a few implications.
>>>> 
>>>> 
>>>> 
>>>> # Possible Implementation:
>>>> 
>>>> There are a few ways to implement this change.
>>>> 
>>>> - Remove Combinable annotation or mark it deprecated (and keep effect)
>>>> 
>>>> - Remove default combine method from RichGroupReduceFunction or deprecate
>>>> it
>>>> 
>>>> 
>>>> 
>>>> Approach 1:
>>>> - Remove Combinable annotation
>>>> - Remove default combine() method from RichGroupReduceFunction
>>>> - Effect:
>>>>   - All RichGroupReduceFunctions that do either have the Combinable
>>>> annotation or implement the combine method do not compile anymore
>>>>   - GroupReduceFunctions that have the Combinable annotation do not
>>>> compile anymore
>>>>   - GroupReduceFunctions that implement a combine interface without
>>>> having the annotation (and not being set to combinable during program
>>>> construction) will execute the previously not executed combine method.
>>> This
>>>> might change the behavior of the program. In worst case, the program
>>> might
>>>> silently produce wrong results (or crash) if the combiner implementation
>>>> was faulty. In best case, the program executes faster.
>>>> 
>>>> 
>>>> 
>>>> Approach 2:
>>>> - As Approach 1
>>>> - In addition extend both combine interfaces with a deprecated marker
>>>> method. This will ensure that all functions that implement a combinable
>>>> interface do not compile anymore and need to be fixed. This could prevent
>>>> the silent failure as in Approach 1, but would also cause an additional
>>> API
>>>> breaking change once the deprecated marker method is removed again.
>>>> 
>>>> 
>>>> 
>>>> Approach 3:
>>>> - Mark Combinable annotation deprecated
>>>> - Mark combine() method in RichGroupReduceFunction as deprecated
>>>> - Effect:
>>>>   - There'll be a couple of deprecation warnings.
>>>>   - We face the same problem with silent failures as in Approach 1.
>>>>   - We have to check if RichGroupReduceFunction's override combine or
>>> not
>>>> (can be done with reflection). If the method is not overridden we do not
>>>> execute it (unless there is a Combinable annotation) and we are fine. If
>>> it
>>>> is overridden and no Combinable annotation has been defined, we have the
>>>> same problem with silent failures as before.
>>>>   - After we remove the deprecated annotation and method, we have the
>>>> same effect as with Approach 1.
>>>> 
>>>> 
>>>> 
>>>> There are more alternatives, but these are the most viable, IMO.
>>>> 
>>>> 
>>>> 
>>>> I think, if we want to remove the combinable annotation, we should do it
>>>> now.
>>>> 
>>>> Given the three options, would go for Approach 1. Yes, breaks a lot of
>>> code
>>>> and yes there is the possibility of computing incorrect results.
>>> Approach 2
>>>> is safer but would mean another API breaking change in the future.
>>> Approach
>>>> 3 comes with fewer breaking changes but has the same problem of silent
>>>> failures.
>>>> 
>>>> IMO, the breaking API changes of Approach 1 are even desirable because
>>> they
>>>> will make users aware that this feature changed.
>>>> 
>>>> 
>>>> 
>>>> What do you think?
>>>> 
>>>> 
>>>> 
>>>> Cheers, Fabian
>>>> 

Regards,
Chiwan Park


Reply via email to