[DISCUSS] Adaptive Parallelism of Job Vertex

Bo WANG Mon, 08 Apr 2019 01:29:23 -0700

Hi all,
In distribution computing system, execution parallelism is vital for both
resource efficiency and execution performance. In Flink, execution
parallelism is a pre-specified parameter, which is usually an empirical
value and thus might not be optimal on the various amount of data processed
by each task.


Furthermore, a fixed parallelism cannot scale to varying data size, which
is common in production cluster, since we may not frequently change the
cluster configuration.

Thus, we propose adaptively determine the execution parallelism of each
vertex at runtime based on the actual input data size and an ideal data
size processed by each task. The ideal data size is a pre-specified
parameter according to the property of the operator.

The design doc is ready:
https://docs.google.com/document/d/1ZxnoJ3SOxUk1PL2xC1t-kepq28Pg20IL6eVKUUOWSKY/edit?usp=sharing,
any comments are highly appreciated.

[DISCUSS] Adaptive Parallelism of Job Vertex

Reply via email to