Hi, Rebalance is more safe default setting that protects against data skew. And even the smallest data skew can create a bottleneck much larger then the serialisation/network transfer cost. Especially if one changes the parallelism to a value that’s not a result of multiplication or division (like N down to N-1). And data skew can be arbitrarily large, while rebalance overhead compare to rescale is limited.
Piotrek > On 6 Feb 2018, at 04:32, Kien Truong <duckientru...@gmail.com> wrote: > > Thanks Piotr, it works. > May I ask why default behavior when reducing parallelism is rebalance, and > not rescale ? > > Regards, > Kien > > Sent from TypeApp <http://www.typeapp.com/r?b=11979> > On Feb 5, 2018, at 15:28, Piotr Nowojski <pi...@data-artisans.com > <mailto:pi...@data-artisans.com>> wrote: > Hi, > > It should work like this out of the box if you use rescale method: > > https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/index.html#physical-partitioning > > <https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/index.html#physical-partitioning> > > If it will not work, please let us know. > > Piotrek > >> On 3 Feb 2018, at 04:39, Kien Truong < duckientru...@gmail.com >> <mailto:duckientru...@gmail.com>> wrote: >> >> Hi, >> >> Assuming that I have a streaming job, using 30 task managers with 4 slot >> each. I want to change the parallelism of 1 operator from 120 to 30. Are >> there anyway so that each subtask of this operator get data from 4 upstream >> subtasks running in the same task manager, thus avoiding network completely >> ? >> >> Best regards, >> Kien >> >> Sent from TypeApp <http://www.typeapp.com/r?b=11979>