Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5241#discussion_r162642637
  
    --- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/DataSetAggregate.scala
 ---
    @@ -157,6 +158,7 @@ class DataSetAggregate(
           } else {
             inputDS
               .reduceGroup(finalAgg)
    +          .mapPartition(emptyProcessMapPartition.get)
    --- End diff --
    
    I thought about this again. 
    
    I think we should extend `DataSetFinalAggFunction` to also implement 
`MapPartitionFunction` similar to the `DataSetPreAggFunction`. The 
`GroupReduceFunction.reduceGroup()` method and the 
`MapPartitionFunction.mapPartition()` function share the same code. For the 
grouped aggregation, we use `reduceGroup(function)` and for the non-grouped 
aggregation we use `mapPartition(function).setParallelism(1)`. Setting the 
parallelism is important here.
    
    That way we avoid an additional function and its a cleaner design.


---

Reply via email to