For now, I think it is worth documenting and leaving it as it is. A while back we thought about adding a Static Code Analysis rule to find such cases and create a warning. For Reduce, that is quite straightforward, for GrouReduce quite tricky... Am 02.02.2016 21:55 schrieb "Greg Hogan" <c...@greghogan.com>:
> If a user modifies keyed fields of a grouped reduce during a combine then > the reduce will receive incorrect groupings. For example, a useless > modification to word count: > > public WC reduce(WC in1, WC in2) { > return new WC(in1.word + " " + in2.word, in1.count + in2.count); > } > > I don't see an efficient means to prevent this. Is this limitation worth > documenting, or can we safely assume that no one will ever attempt this? > MapReduce also has this limitation, and Spark gets around this by > separating keys and values and only presenting values to reduce. > > "Reduce on Grouped DataSet: A Reduce transformation that is applied on a > grouped DataSet reduces each group to a single element using a user-defined > reduce function. For each group of input elements, a reduce function > successively combines pairs of elements into one element until only a > single element for each group remains." > > Greg >