Hi Paul, Flink pushes the results of operators (including GroupReduce) to the next operator or sink as soon as they are computed. So what you are asking for is actually happening. However, before the GroupReduceFunction can be applied, the whole data is sorted in order to group the data. This step is usually more expensive than applying the GroupReduceFunction. Therefore, it looks like the output is batched. Flink does only support sort-based grouping, however also hash-based grouping would not help, because Flink would not know when to close a group until all data is consumed.
Please let me know if you have further questions. Best, Fabian 2016-10-26 19:07 GMT+02:00 Paul Wilson <paulalexwil...@gmail.com>: > Hi, > > DataSet API > Flink 1.1.3 > > I have an application where I'd like to perform some mapping before > batching the results and passing them to the sink. I'm performing a > 'composite' key selection to group the items by their natural key as well > as a batch (itemCount / batchSize). When I reduce the batches and pass them > to the sink, the whole flow is waiting for all reduces to complete before > passing them to sink. > > Is there some way that the results of a single group reduce can be passed > to the sink before all reduces are complete? > > Hope that makes sense, > Regards, > Paul >