Hi all,

I’d like to start a discussion about FLIP-235[1], which introduce a
new shuffle mode
 can overcome some of the problems of Pipelined Shuffle and Blocking
Shuffle in batch scenarios.

Currently in Flink, task scheduling is more or less constrained by the
shuffle implementations.
This will bring the following disadvantages:

   1. Pipelined Shuffle:
    For pipelined shuffle, the upstream and downstream tasks are
required to be deployed at the same time, to avoid upstream tasks
being blocked forever. This is fine when there are enough resources
for both upstream and downstream tasks to run simultaneously, but will
cause the following problems otherwise:
   1.
      Pipelined shuffle connected tasks (i.e., a pipelined region)
cannot be executed until obtaining resources for all of them,
resulting in longer job finishing time and poorer resource efficiency
due to holding part of the resources idle while waiting for the rest.
      2.
      More severely, if multiple jobs each hold part of the cluster
resources and are waiting for more, a deadlock would occur. The chance
is not trivial, especially for scenarios such as OLAP where concurrent
job submissions are frequent.
      2. Blocking Shuffle:
    For blocking shuffle, execution of downstream tasks must wait for
all upstream tasks to finish, despite there might be more resources
available. The sequential execution of upstream and downstream tasks
significantly increase the job finishing time, and the disk IO
workload for spilling and loading full intermediate data also affects
the performance.

We believe the root cause of the above problems is that shuffle
implementations put unnecessary constraints on task scheduling.

To solve this problem, Xintong Song and I propose to introduce hybrid
shuffle to minimize the scheduling constraints. With Hybrid Shuffle,
Flink should:

   1. Make best use of available resources.
    Ideally, we want Flink to always make progress if possible. That
is to say, it should always execute a pending task if there are
resources available for that task.
   2. Minimize disk IO load.
    In-flight data should be consumed directly from memory as much as
possible. Only data that is not consumed timely should be spilled to
disk.

You can find more details in FLIP-235. Looking forward to your feedback.


[1]

https://cwiki.apache.org/confluence/display/FLINK/FLIP-235%3A+Hybrid+Shuffle+Mode



Best regards,

Weijie

Reply via email to