Re: group by optimizations with sorted input

2020-02-14 Thread Robert Metzger
I assume you are using the DataSet API. There, you can do a combinable group reduce: https://ci.apache.org/projects/flink/flink-docs-master/dev/batch/dataset_transformations.html#combinable-groupreducefunctions The combine() method will be executed on the sender side, reducing the amount of data t

group by optimizations with sorted input

2020-02-13 Thread Richard Moorhead
In batch mode, if input is sorted prior to a group by operation; does flink forward the aggregate data early? Is there a way to prevent grouping operations from buffering all data in a GBK operation in batch mode?