Hi Garrett, You can call .setParallelism(1) on just this operator:
ds.reduceGroup(new GroupReduceFunction...).setParallelism(1) Best, Gabor On Mon, Oct 2, 2017 at 3:46 PM, Garrett Barton <garrett.bar...@gmail.com> wrote: > I have a complex alg implemented using the DataSet api and by default it > runs with parallel 90 for good performance. At the end I want to perform a > clustering of the resulting data and to do that correctly I need to pass all > the data through a single thread/process. > > I read in the docs that as long as I did a global reduce using > DataSet.reduceGroup(new GroupReduceFunction....) that it would force it to a > single thread. Yet when I run the flow and bring it up in the ui, I see > parallel 90 all the way through the dag including this one. > > Is there a config or feature to force the flow back to a single thread? Or > should I just split this into two completely separate jobs? I'd rather not > split as I would like to use flinks ability to iterate on this alg and > cluster combo. > > Thank you