Hi Bo Wang,

thanks for proposing this design document. I think it is an interesting
idea to improve Flink's execution efficiency.

At the moment, the community is actively working on making Flink's
scheduler pluggable. Once this is possible, we could try this feature out
by implementing a scheduler which supports adaptive parallelism without
affecting the existing code. I think this would be a nice approach to
further evaluate and benchmark the implications of such a strategy. What do
you think?

Cheers,
Till

On Mon, Apr 8, 2019 at 10:28 AM Bo WANG <wbeaglewatc...@gmail.com> wrote:

> Hi all,
> In distribution computing system, execution parallelism is vital for both
> resource efficiency and execution performance. In Flink, execution
> parallelism is a pre-specified parameter, which is usually an empirical
> value and thus might not be optimal on the various amount of data processed
> by each task.
>
> Furthermore, a fixed parallelism cannot scale to varying data size, which
> is common in production cluster, since we may not frequently change the
> cluster configuration.
>
> Thus, we propose adaptively determine the execution parallelism of each
> vertex at runtime based on the actual input data size and an ideal data
> size processed by each task. The ideal data size is a pre-specified
> parameter according to the property of the operator.
>
> The design doc is ready:
>
> https://docs.google.com/document/d/1ZxnoJ3SOxUk1PL2xC1t-kepq28Pg20IL6eVKUUOWSKY/edit?usp=sharing
> ,
> any comments are highly appreciated.
>

Reply via email to