Hi Fabian, We have reworked our execution to remove the group reduce step and replaced it with a map partition and we're seeing data passing more immediately now.
Thanks for your quick reply, it was very useful. Regards, Paul On 26 October 2016 at 19:57, Fabian Hueske <fhue...@gmail.com> wrote: > Hi Paul, > > Flink pushes the results of operators (including GroupReduce) to the next > operator or sink as soon as they are computed. So what you are asking for > is actually happening. > However, before the GroupReduceFunction can be applied, the whole data is > sorted in order to group the data. This step is usually more expensive than > applying the GroupReduceFunction. Therefore, it looks like the output is > batched. > Flink does only support sort-based grouping, however also hash-based > grouping would not help, because Flink would not know when to close a group > until all data is consumed. > > Please let me know if you have further questions. > > Best, Fabian > > > 2016-10-26 19:07 GMT+02:00 Paul Wilson <paulalexwil...@gmail.com>: > >> Hi, >> >> DataSet API >> Flink 1.1.3 >> >> I have an application where I'd like to perform some mapping before >> batching the results and passing them to the sink. I'm performing a >> 'composite' key selection to group the items by their natural key as well >> as a batch (itemCount / batchSize). When I reduce the batches and pass them >> to the sink, the whole flow is waiting for all reduces to complete before >> passing them to sink. >> >> Is there some way that the results of a single group reduce can be passed >> to the sink before all reduces are complete? >> >> Hope that makes sense, >> Regards, >> Paul >> > >