Xintong Song created FLINK-17390:
------------------------------------

             Summary: Container resource cannot be mapped on Hadoop 2.10+
                 Key: FLINK-17390
                 URL: https://issues.apache.org/jira/browse/FLINK-17390
             Project: Flink
          Issue Type: Bug
          Components: Deployment / YARN
    Affects Versions: 1.11.0
            Reporter: Xintong Song
             Fix For: 1.11.0


In FLINK-16438, we introduced {{WorkerSpecContainerResourceAdapter}} for 
mapping Yarn container {{Resource}} with Flink {{WorkerResourceSpec}}. Inside 
this class, we use {{Resource}} for hash map keys and set elements, assuming 
that {{Resource}} instances that describes the same set of resources have the 
same hash code.

This assumption is not always true. {{Resource}} is an abstract class and may 
have different implementations. In Hadoop 2.10+, {{LightWeightResource}}, a new 
implementation of {{Resource}}, is introduced for {{Resource}} generated by 
{{Resource.newInstance}} on the AM side, which overrides the {{hashCode}} 
method. That means, a {{Resource}} generated on AM may have a different hash 
code compared to an equal {{Resource}} returned from Yarn.

To solve this problem, we may introduce an {{InternalResource}} as an inner 
class of {{WorkerSpecContainerResourceAdapter}}, with {{hashCode}} method 
depends only on the fields needed by Flink (ATM memroy and vcores). 
{{WorkerSpecContainerResourceAdapter}} should only use {{InternalResource}} for 
internal state management, and do conversions for {{Resource}} passed into and 
returned from it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to