I’m also for approach #1. Now is nice time to apply some API-breaking changes.
> On Jan 14, 2016, at 1:19 AM, Aljoscha Krettek <aljos...@apache.org> wrote: > > I’m also for Approach #1. I like simplifying things. >> On 13 Jan 2016, at 14:25, Vasiliki Kalavri <vasilikikala...@gmail.com> wrote: >> >> Hi, >> >> +1 for removing the Combinable annotation. Approach 1 sounds like the >> best option to me. >> >> >> On 13 January 2016 at 14:11, Till Rohrmann <trohrm...@apache.org> wrote: >> >>> Hi Fabian, >>> >>> thanks for bringing this issue up. I agree with you that it would be nice >>> to remove the Combinable annotation if it is not really needed. Now if >>> people are not aware of the Combinable interface then they might think that >>> they are actually using combiners by simply implementing CombineFunction. >>> >>> >> has happened to me :S >> >> >> >>> I would also be in favour of approach 1 combined with a migration guide >>> where we highlight possible problems with a faulty combine implementation. >>> >>> >> Migration guide is a good idea! >> >> >> >>> Cheers, >>> Till >>> >>> >>> >> -Vasia. >> >> >> >>> On Wed, Jan 13, 2016 at 1:53 PM, Fabian Hueske <fhue...@gmail.com> wrote: >>> >>>> Hi everybody, >>>> >>>> >>>> >>>> As part of the upcoming 1.0 release we are stabilizing Flink's APIs. >>>> >>>> I would like to start a discussion about removing the Combinable >>> annotation >>>> from the DataSet API. >>>> >>>> >>>> >>>> # The Current State: >>>> >>>> The DataSet API offers two interface for combine functions, >>> CombineFunction >>>> and GroupCombineFunction, which can be added to a GroupReduceFunctions >>>> (GRF). >>>> >>>> >>>> However, implementing a combine interface does not make a GRF combinable. >>>> In addition, the GRF class needs to be annotated with the Combinable >>>> annotation. The RichGroupReduceFunction has a default implementation of >>>> combine, which forwards the combine parameters to the reduce method. This >>>> default implementation is not used, unless the class that extends RGRF >>> has >>>> the Combinable annotation. >>>> >>>> >>>> >>>> In addition to the Combinable annotation, the DataSet API >>>> GroupReduceOperator features the setCombinable(boolean) method to >>> override >>>> a missing or existing Combinable annotation. >>>> >>>> >>>> >>>> # My Proposal: >>>> >>>> I propose to remove the Combinable annotation because it is not required >>>> and complicates the definition of combiners. Instead, the combine method >>> of >>>> a GroupReduceFunction should be executed if it implements one of the >>>> combine function interfaces. This would require to remove the default >>>> combine implementation of the RichGroupReduceFunction as well. >>>> >>>> This would be an API breaking change and has a few implications. >>>> >>>> >>>> >>>> # Possible Implementation: >>>> >>>> There are a few ways to implement this change. >>>> >>>> - Remove Combinable annotation or mark it deprecated (and keep effect) >>>> >>>> - Remove default combine method from RichGroupReduceFunction or deprecate >>>> it >>>> >>>> >>>> >>>> Approach 1: >>>> - Remove Combinable annotation >>>> - Remove default combine() method from RichGroupReduceFunction >>>> - Effect: >>>> - All RichGroupReduceFunctions that do either have the Combinable >>>> annotation or implement the combine method do not compile anymore >>>> - GroupReduceFunctions that have the Combinable annotation do not >>>> compile anymore >>>> - GroupReduceFunctions that implement a combine interface without >>>> having the annotation (and not being set to combinable during program >>>> construction) will execute the previously not executed combine method. >>> This >>>> might change the behavior of the program. In worst case, the program >>> might >>>> silently produce wrong results (or crash) if the combiner implementation >>>> was faulty. In best case, the program executes faster. >>>> >>>> >>>> >>>> Approach 2: >>>> - As Approach 1 >>>> - In addition extend both combine interfaces with a deprecated marker >>>> method. This will ensure that all functions that implement a combinable >>>> interface do not compile anymore and need to be fixed. This could prevent >>>> the silent failure as in Approach 1, but would also cause an additional >>> API >>>> breaking change once the deprecated marker method is removed again. >>>> >>>> >>>> >>>> Approach 3: >>>> - Mark Combinable annotation deprecated >>>> - Mark combine() method in RichGroupReduceFunction as deprecated >>>> - Effect: >>>> - There'll be a couple of deprecation warnings. >>>> - We face the same problem with silent failures as in Approach 1. >>>> - We have to check if RichGroupReduceFunction's override combine or >>> not >>>> (can be done with reflection). If the method is not overridden we do not >>>> execute it (unless there is a Combinable annotation) and we are fine. If >>> it >>>> is overridden and no Combinable annotation has been defined, we have the >>>> same problem with silent failures as before. >>>> - After we remove the deprecated annotation and method, we have the >>>> same effect as with Approach 1. >>>> >>>> >>>> >>>> There are more alternatives, but these are the most viable, IMO. >>>> >>>> >>>> >>>> I think, if we want to remove the combinable annotation, we should do it >>>> now. >>>> >>>> Given the three options, would go for Approach 1. Yes, breaks a lot of >>> code >>>> and yes there is the possibility of computing incorrect results. >>> Approach 2 >>>> is safer but would mean another API breaking change in the future. >>> Approach >>>> 3 comes with fewer breaking changes but has the same problem of silent >>>> failures. >>>> >>>> IMO, the breaking API changes of Approach 1 are even desirable because >>> they >>>> will make users aware that this feature changed. >>>> >>>> >>>> >>>> What do you think? >>>> >>>> >>>> >>>> Cheers, Fabian >>>> Regards, Chiwan Park