Thanks for preparing this FLIP, Weijie. I think this is a good improvement on batch resource elasticity. Looking forward to the community feedback.
Best, Xintong On Thu, May 19, 2022 at 2:31 PM weijie guo <guoweijieres...@gmail.com> wrote: > Hi all, > > > I’d like to start a discussion about FLIP-235[1], which introduce a new > shuffle mode > can overcome some of the problems of Pipelined Shuffle and Blocking Shuffle > in batch scenarios. > > > Currently in Flink, task scheduling is more or less constrained by the > shuffle implementations. > This will bring the following disadvantages: > > 1. Pipelined Shuffle: > For pipelined shuffle, the upstream and downstream tasks are required to > be deployed at the same time, to avoid upstream tasks being blocked forever. > This is fine when there are enough resources for both upstream and downstream > tasks to run simultaneously, but will cause the following problems otherwise: > 1. > Pipelined shuffle connected tasks (i.e., a pipelined region) cannot be > executed until obtaining resources for all of them, resulting in longer job > finishing time and poorer resource efficiency due to holding part of the > resources idle while waiting for the rest. > 2. > More severely, if multiple jobs each hold part of the cluster resources > and are waiting for more, a deadlock would occur. The chance is not trivial, > especially for scenarios such as OLAP where concurrent job submissions are > frequent. > 2. Blocking Shuffle: > For blocking shuffle, execution of downstream tasks must wait for all > upstream tasks to finish, despite there might be more resources available. > The sequential execution of upstream and downstream tasks significantly > increase the job finishing time, and the disk IO workload for spilling and > loading full intermediate data also affects the performance. > > > We believe the root cause of the above problems is that shuffle > implementations put unnecessary constraints on task scheduling. > > > To solve this problem, Xintong Song and I propose to introduce hybrid shuffle > to minimize the scheduling constraints. With Hybrid Shuffle, Flink should: > > 1. Make best use of available resources. > Ideally, we want Flink to always make progress if possible. That is to > say, it should always execute a pending task if there are resources available > for that task. > 2. Minimize disk IO load. > In-flight data should be consumed directly from memory as much as > possible. Only data that is not consumed timely should be spilled to disk. > > You can find more details in FLIP-235. Looking forward to your feedback. > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-235%3A+Hybrid+Shuffle+Mode > > > > Best regards, > > Weijie >