Hi, +1 for removing the Combinable annotation. Approach 1 sounds like the best option to me.
On 13 January 2016 at 14:11, Till Rohrmann <trohrm...@apache.org> wrote: > Hi Fabian, > > thanks for bringing this issue up. I agree with you that it would be nice > to remove the Combinable annotation if it is not really needed. Now if > people are not aware of the Combinable interface then they might think that > they are actually using combiners by simply implementing CombineFunction. > > has happened to me :S > I would also be in favour of approach 1 combined with a migration guide > where we highlight possible problems with a faulty combine implementation. > > Migration guide is a good idea! > Cheers, > Till > > > -Vasia. > On Wed, Jan 13, 2016 at 1:53 PM, Fabian Hueske <fhue...@gmail.com> wrote: > > > Hi everybody, > > > > > > > > As part of the upcoming 1.0 release we are stabilizing Flink's APIs. > > > > I would like to start a discussion about removing the Combinable > annotation > > from the DataSet API. > > > > > > > > # The Current State: > > > > The DataSet API offers two interface for combine functions, > CombineFunction > > and GroupCombineFunction, which can be added to a GroupReduceFunctions > > (GRF). > > > > > > However, implementing a combine interface does not make a GRF combinable. > > In addition, the GRF class needs to be annotated with the Combinable > > annotation. The RichGroupReduceFunction has a default implementation of > > combine, which forwards the combine parameters to the reduce method. This > > default implementation is not used, unless the class that extends RGRF > has > > the Combinable annotation. > > > > > > > > In addition to the Combinable annotation, the DataSet API > > GroupReduceOperator features the setCombinable(boolean) method to > override > > a missing or existing Combinable annotation. > > > > > > > > # My Proposal: > > > > I propose to remove the Combinable annotation because it is not required > > and complicates the definition of combiners. Instead, the combine method > of > > a GroupReduceFunction should be executed if it implements one of the > > combine function interfaces. This would require to remove the default > > combine implementation of the RichGroupReduceFunction as well. > > > > This would be an API breaking change and has a few implications. > > > > > > > > # Possible Implementation: > > > > There are a few ways to implement this change. > > > > - Remove Combinable annotation or mark it deprecated (and keep effect) > > > > - Remove default combine method from RichGroupReduceFunction or deprecate > > it > > > > > > > > Approach 1: > > - Remove Combinable annotation > > - Remove default combine() method from RichGroupReduceFunction > > - Effect: > > - All RichGroupReduceFunctions that do either have the Combinable > > annotation or implement the combine method do not compile anymore > > - GroupReduceFunctions that have the Combinable annotation do not > > compile anymore > > - GroupReduceFunctions that implement a combine interface without > > having the annotation (and not being set to combinable during program > > construction) will execute the previously not executed combine method. > This > > might change the behavior of the program. In worst case, the program > might > > silently produce wrong results (or crash) if the combiner implementation > > was faulty. In best case, the program executes faster. > > > > > > > > Approach 2: > > - As Approach 1 > > - In addition extend both combine interfaces with a deprecated marker > > method. This will ensure that all functions that implement a combinable > > interface do not compile anymore and need to be fixed. This could prevent > > the silent failure as in Approach 1, but would also cause an additional > API > > breaking change once the deprecated marker method is removed again. > > > > > > > > Approach 3: > > - Mark Combinable annotation deprecated > > - Mark combine() method in RichGroupReduceFunction as deprecated > > - Effect: > > - There'll be a couple of deprecation warnings. > > - We face the same problem with silent failures as in Approach 1. > > - We have to check if RichGroupReduceFunction's override combine or > not > > (can be done with reflection). If the method is not overridden we do not > > execute it (unless there is a Combinable annotation) and we are fine. If > it > > is overridden and no Combinable annotation has been defined, we have the > > same problem with silent failures as before. > > - After we remove the deprecated annotation and method, we have the > > same effect as with Approach 1. > > > > > > > > There are more alternatives, but these are the most viable, IMO. > > > > > > > > I think, if we want to remove the combinable annotation, we should do it > > now. > > > > Given the three options, would go for Approach 1. Yes, breaks a lot of > code > > and yes there is the possibility of computing incorrect results. > Approach 2 > > is safer but would mean another API breaking change in the future. > Approach > > 3 comes with fewer breaking changes but has the same problem of silent > > failures. > > > > IMO, the breaking API changes of Approach 1 are even desirable because > they > > will make users aware that this feature changed. > > > > > > > > What do you think? > > > > > > > > Cheers, Fabian > > >