[jira] [Closed] (FLINK-19151) Flink does not normalize container resource with correct configurations when Yarn FairScheduler is used

Xintong Song (Jira) Thu, 10 Sep 2020 04:13:09 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-19151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Xintong Song closed FLINK-19151.
--------------------------------
    Resolution: Fixed

Fixed via
* master: 00bf41f33a10b59850f0ee4fe31c0271484d6d4c
* release-1.11: bfff6b15ec7dd3a4415f6a5a9d8535ea7960e474

> Flink does not normalize container resource with correct configurations when 
> Yarn FairScheduler is used 
> --------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-19151
>                 URL: https://issues.apache.org/jira/browse/FLINK-19151
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / YARN
>    Affects Versions: 1.11.2
>            Reporter: Xintong Song
>            Assignee: jinhai
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.12.0, 1.11.3
>
>
> h3. Problem
> It's a Yarn protocol that the requested container resource will be normalized 
> for allocation. That means, the allocated container may have different 
> resource (larger than or equal to) compared to what is requested.
> Currently, Flink matches the allocated containers to the original requests by 
> reading the Yarn configurations and calculate how the requested resources 
> should be normalized.
> What has been overlooked is that, Yarn FairScheduler (and its subclass 
> SLSFairScheduler) has overridden the normalization behavior. To be specific,
>  * By default, Yarn normalize container resources to integer multiple of 
> "yarn.scheduler.minimum-allocation-[mb|vcores]"
>  * FairScheduler normalize container resources to integer multiple of 
> "yarn.resource-types.[memory-mb|vcores].increment-allocation" (or the 
> deprecated keys "yarn.scheduler.increment-allocation-[mb|vcores]"), while 
> making sure the resource is no less than 
> "yarn.scheduler.minimum-allocation-[mb|vcores]"
> h3. Proposal for short term solution
> To fix this problem, a quick and easy way is to also read Yarn configuration 
> and learn which scheduler is used, and perform normalization calculations 
> accordingly. This should be good enough to cover behaviors of all the 
> schedulers that Yarn currently provides. The limitation is that, Flink will 
> not be able to deal with custom Yarn schedulers which override the 
> normalization behaviors.
> h3. Proposal for long term solution
> For long term, it would be good to use Yarn 
> ContainerRequest#allocationRequestId to match the allocated containers with 
> the original requests, so that Flink no longer needs to understand how Yarn 
> normalize container resources. 
> Yarn ContainerRequest#allocationRequestId is introduced in Hadoop 2.9, while 
> ATM Flink claims to be compatible with Hadoop 2.4+. Therefore, this solution 
> would not work at the moment.
> Another idea is to support various Hadoop versions with different container 
> matching logics. We can abstract the container matching logics into a 
> dedicating component, and provide different implementations for it. This will 
> allow Flink to take advantages of the new versions (e.g., work well with 
> custom schedulers), while stay compatible with the old versions without those 
> advantages.
> Given that we need the resource based matching anyway for the old Hadoop 
> versions, and the cost for maintaining two sets of matching logics, I tend to 
> think this approach as a back-up option to be worked on when we indeed see a 
> need for it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (FLINK-19151) Flink does not normalize container resource with correct configurations when Yarn FairScheduler is used

Reply via email to