Gábor , Thank you for the reply, I gave that a go and the flow still showed parallel 90 for each step. Is the ui not 100% accurate perhaps?
To get around it for now I implemented a partitioner that threw all the data to the same partition, hack but works! On Tue, Oct 3, 2017 at 4:12 AM, Gábor Gévay <gga...@gmail.com> wrote: > Hi Garrett, > > You can call .setParallelism(1) on just this operator: > > ds.reduceGroup(new GroupReduceFunction...).setParallelism(1) > > Best, > Gabor > > > > On Mon, Oct 2, 2017 at 3:46 PM, Garrett Barton <garrett.bar...@gmail.com> > wrote: > > I have a complex alg implemented using the DataSet api and by default it > > runs with parallel 90 for good performance. At the end I want to perform > a > > clustering of the resulting data and to do that correctly I need to pass > all > > the data through a single thread/process. > > > > I read in the docs that as long as I did a global reduce using > > DataSet.reduceGroup(new GroupReduceFunction....) that it would force it > to a > > single thread. Yet when I run the flow and bring it up in the ui, I see > > parallel 90 all the way through the dag including this one. > > > > Is there a config or feature to force the flow back to a single thread? > Or > > should I just split this into two completely separate jobs? I'd rather > not > > split as I would like to use flinks ability to iterate on this alg and > > cluster combo. > > > > Thank you >