Hi all,
In distribution computing system, execution parallelism is vital for both
resource efficiency and execution performance. In Flink, execution
parallelism is a pre-specified parameter, which is usually an empirical
value and thus might not be optimal on the various amount of data processed
by each task.

Furthermore, a fixed parallelism cannot scale to varying data size, which
is common in production cluster, since we may not frequently change the
cluster configuration.

Thus, we propose adaptively determine the execution parallelism of each
vertex at runtime based on the actual input data size and an ideal data
size processed by each task. The ideal data size is a pre-specified
parameter according to the property of the operator.

The design doc is ready:
https://docs.google.com/document/d/1ZxnoJ3SOxUk1PL2xC1t-kepq28Pg20IL6eVKUUOWSKY/edit?usp=sharing,
any comments are highly appreciated.

Reply via email to