Hi,

Rebalance is more safe default setting that protects against data skew. And 
even the smallest data skew can create a bottleneck much larger then the 
serialisation/network transfer cost. Especially if one changes the parallelism 
to a value that’s not a result of multiplication or division (like N down to 
N-1). And data skew can be arbitrarily large, while rebalance overhead compare 
to rescale is limited.

Piotrek


> On 6 Feb 2018, at 04:32, Kien Truong <duckientru...@gmail.com> wrote:
> 
> Thanks Piotr, it works.
> May I ask why default behavior when reducing parallelism is rebalance, and 
> not rescale ?
> 
> Regards,
> Kien
> 
> Sent from TypeApp <http://www.typeapp.com/r?b=11979>
> On Feb 5, 2018, at 15:28, Piotr Nowojski <pi...@data-artisans.com 
> <mailto:pi...@data-artisans.com>> wrote:
> Hi,
> 
> It should work like this out of the box if you use rescale method:
> 
> https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/index.html#physical-partitioning
>  
> <https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/index.html#physical-partitioning>
> 
> If it will not work, please let us know.
> 
> Piotrek 
> 
>> On 3 Feb 2018, at 04:39, Kien Truong < duckientru...@gmail.com 
>> <mailto:duckientru...@gmail.com>> wrote:
>> 
>> Hi, 
>> 
>> Assuming that I have a streaming job, using 30 task managers with 4 slot 
>> each. I want to change the parallelism of 1 operator from 120 to 30. Are 
>> there anyway so that each subtask of this operator get data from 4 upstream 
>> subtasks running in the same task manager, thus avoiding network completely 
>> ? 
>> 
>> Best regards, 
>> Kien 
>> 
>> Sent from TypeApp <http://www.typeapp.com/r?b=11979>

Reply via email to