I’m also for Approach #1. I like simplifying things.
> On 13 Jan 2016, at 14:25, Vasiliki Kalavri <vasilikikala...@gmail.com> wrote:
>
> Hi,
>
> +1 for removing the Combinable annotation. Approach 1 sounds like the
> best option to me.
>
>
> On 13 January 2016 at 14:11, Till Rohrmann <trohrm...@apache.org> wrote:
>
>> Hi Fabian,
>>
>> thanks for bringing this issue up. I agree with you that it would be nice
>> to remove the Combinable annotation if it is not really needed. Now if
>> people are not aware of the Combinable interface then they might think that
>> they are actually using combiners by simply implementing CombineFunction.
>>
>>
> has happened to me :S
>
>
>
>> I would also be in favour of approach 1 combined with a migration guide
>> where we highlight possible problems with a faulty combine implementation.
>>
>>
> Migration guide is a good idea!
>
>
>
>> Cheers,
>> Till
>>
>>
>>
> -Vasia.
>
>
>
>> On Wed, Jan 13, 2016 at 1:53 PM, Fabian Hueske <fhue...@gmail.com> wrote:
>>
>>> Hi everybody,
>>>
>>>
>>>
>>> As part of the upcoming 1.0 release we are stabilizing Flink's APIs.
>>>
>>> I would like to start a discussion about removing the Combinable
>> annotation
>>> from the DataSet API.
>>>
>>>
>>>
>>> # The Current State:
>>>
>>> The DataSet API offers two interface for combine functions,
>> CombineFunction
>>> and GroupCombineFunction, which can be added to a GroupReduceFunctions
>>> (GRF).
>>>
>>>
>>> However, implementing a combine interface does not make a GRF combinable.
>>> In addition, the GRF class needs to be annotated with the Combinable
>>> annotation. The RichGroupReduceFunction has a default implementation of
>>> combine, which forwards the combine parameters to the reduce method. This
>>> default implementation is not used, unless the class that extends RGRF
>> has
>>> the Combinable annotation.
>>>
>>>
>>>
>>> In addition to the Combinable annotation, the DataSet API
>>> GroupReduceOperator features the setCombinable(boolean) method to
>> override
>>> a missing or existing Combinable annotation.
>>>
>>>
>>>
>>> # My Proposal:
>>>
>>> I propose to remove the Combinable annotation because it is not required
>>> and complicates the definition of combiners. Instead, the combine method
>> of
>>> a GroupReduceFunction should be executed if it implements one of the
>>> combine function interfaces. This would require to remove the default
>>> combine implementation of the RichGroupReduceFunction as well.
>>>
>>> This would be an API breaking change and has a few implications.
>>>
>>>
>>>
>>> # Possible Implementation:
>>>
>>> There are a few ways to implement this change.
>>>
>>> - Remove Combinable annotation or mark it deprecated (and keep effect)
>>>
>>> - Remove default combine method from RichGroupReduceFunction or deprecate
>>> it
>>>
>>>
>>>
>>> Approach 1:
>>> - Remove Combinable annotation
>>> - Remove default combine() method from RichGroupReduceFunction
>>> - Effect:
>>> - All RichGroupReduceFunctions that do either have the Combinable
>>> annotation or implement the combine method do not compile anymore
>>> - GroupReduceFunctions that have the Combinable annotation do not
>>> compile anymore
>>> - GroupReduceFunctions that implement a combine interface without
>>> having the annotation (and not being set to combinable during program
>>> construction) will execute the previously not executed combine method.
>> This
>>> might change the behavior of the program. In worst case, the program
>> might
>>> silently produce wrong results (or crash) if the combiner implementation
>>> was faulty. In best case, the program executes faster.
>>>
>>>
>>>
>>> Approach 2:
>>> - As Approach 1
>>> - In addition extend both combine interfaces with a deprecated marker
>>> method. This will ensure that all functions that implement a combinable
>>> interface do not compile anymore and need to be fixed. This could prevent
>>> the silent failure as in Approach 1, but would also cause an additional
>> API
>>> breaking change once the deprecated marker method is removed again.
>>>
>>>
>>>
>>> Approach 3:
>>> - Mark Combinable annotation deprecated
>>> - Mark combine() method in RichGroupReduceFunction as deprecated
>>> - Effect:
>>> - There'll be a couple of deprecation warnings.
>>> - We face the same problem with silent failures as in Approach 1.
>>> - We have to check if RichGroupReduceFunction's override combine or
>> not
>>> (can be done with reflection). If the method is not overridden we do not
>>> execute it (unless there is a Combinable annotation) and we are fine. If
>> it
>>> is overridden and no Combinable annotation has been defined, we have the
>>> same problem with silent failures as before.
>>> - After we remove the deprecated annotation and method, we have the
>>> same effect as with Approach 1.
>>>
>>>
>>>
>>> There are more alternatives, but these are the most viable, IMO.
>>>
>>>
>>>
>>> I think, if we want to remove the combinable annotation, we should do it
>>> now.
>>>
>>> Given the three options, would go for Approach 1. Yes, breaks a lot of
>> code
>>> and yes there is the possibility of computing incorrect results.
>> Approach 2
>>> is safer but would mean another API breaking change in the future.
>> Approach
>>> 3 comes with fewer breaking changes but has the same problem of silent
>>> failures.
>>>
>>> IMO, the breaking API changes of Approach 1 are even desirable because
>> they
>>> will make users aware that this feature changed.
>>>
>>>
>>>
>>> What do you think?
>>>
>>>
>>>
>>> Cheers, Fabian
>>>
>>