HI HuWeihua,

I think your issue should resolve with 1.9.2 and 1.10 (not released but in
progress).
You can check the related Jira ticket [1].

Best,
Andrey

[1] https://jira.apache.org/jira/browse/FLINK-12122

On Wed, Jan 15, 2020 at 10:08 AM HuWeihua <huweihua....@gmail.com> wrote:

> Hi, All
> We encountered some problems during the upgrade from Flink 1.5 to Flink
> 1.9. Flink's scheduling strategy has changed. Flink 1.9 prefers centralized
> scheduling, while Flink 1.5 prefers decentralized scheduling. This change
> has caused resources imbalance and blocked our upgrade plan. We have
> thousands of jobs that need to be upgraded.
>
> For example,
> There is a job with 10 sources and 100 sinks. Each source need 1 core and
> each sink need 0.1 core.
> Try to run this job on Yarn, configure the numberOfTaskSlots is 10,
> yarn.containers.vcores is 2.
>
> When using Flink-1.5:
> Each TaskManager will run 1 source and 9 sinks, they need 1.9 cores
> totally. So the job with this configuration works very well. The schedule
> results is shown in Figure 1.
> When using Flink-1.9:
> The 10 sources will be scheduled to one TaskManager  and the 100 sinks
> will scheduled to other 10 TaskManagers.  The schedule results is shown
> in Figure 2.
> In this scenario, the TaskManager which run sources need 10 cores, other
> TaskManagers need 1 cores. But TaskManager must be configured the same, So
> we need 11 TaskManager with 10 cores.
> This situation waste (10-2)*11 = 88 cores more than Flink 1.5.
>
> In addition to the waste of resources, we also encountered other problems
> caused by centralized scheduling strategy.
>
>    1. Network bandwidth. Tasks of the same type are scheduled to the one
>    TaskManager, causing too much network traffic on the machine.
>
>
>    1. Some jobs need to sink to the local agent. After centralized
>    scheduling, the insufficient processing capacity of the single machine
>    causes a backlog of consumption.
>
>
> In summary, we think a decentralized scheduling strategy is necessary.
>
>
> Figure 1. Flink 1.5 schedule results
>
> Figure 2. Flink 1.9 schedule results
>
>
>
> Best
> Weihua Hu
>
>

Reply via email to