[jira] [Commented] (FLINK-18681) The jar package version conflict causes the task to continue to increase and grab resources

Xintong Song (Jira) Tue, 28 Jul 2020 22:36:18 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-18681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166878#comment-17166878
 ]


Xintong Song commented on FLINK-18681:
--------------------------------------

[~apach...@163.com], thanks for providing the screenshot and logs.

I found the following warnings in the Yarn RM log.
{code:java}
2020-07-22 17:54:57,155 WARN 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=wangty   
IP=x.x.x.61     OPERATION=AM Released Container TARGET=Scheduler        
RESULT=FAILURE  DESCRIPTION=Trying to release container not owned by app or 
with invalid id.    PERMISSIONS=Unauthorized access or invalid container    
APPID=application_1590424616102_556340  
CONTAINERID=container_1590424616102_556340_01_000002
2020-07-22 17:54:58,157 WARN 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=wangty   
IP=x.x.x.61     OPERATION=AM Released Container TARGET=Scheduler        
RESULT=FAILURE  DESCRIPTION=Trying to release container not owned by app or 
with invalid id.    PERMISSIONS=Unauthorized access or invalid container    
APPID=application_1590424616102_556340  
CONTAINERID=container_1590424616102_556340_01_000003
2020-07-22 17:54:59,160 WARN 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=wangty   
IP=x.x.x.61     OPERATION=AM Released Container TARGET=Scheduler        
RESULT=FAILURE  DESCRIPTION=Trying to release container not owned by app or 
with invalid id.    PERMISSIONS=Unauthorized access or invalid container    
APPID=application_1590424616102_556340  
CONTAINERID=container_1590424616102_556340_01_000004
{code}

It shows that Flink did released the containers, but the operations were 
rejected by the Yarn RM. The API Flink uses for release containers is 
{{AMRMClientAsync#releaseAssignedContainer}}, via the same client that 
successfully allocated containers from Yarn.
{code:java}
  /**
   * Release containers assigned by the Resource Manager. If the app cannot use
   * the container or wants to give up the container then it can release them.
   * The app needs to make new requests for the released resource capability if
   * it still needs it. eg. it released non-local resources
   * @param containerId
   */
  public abstract void releaseAssignedContainer(ContainerId containerId);
{code}

It seems to me that the Hadoop API did not work as expected. I would suggest to 
try get some help from the Apache Hadoop community.

Pulling in [~Tao Yang] who is an Apache Hadoop committer and expert in Yarn.

> The jar package version conflict causes the task to continue to increase and 
> grab resources
> -------------------------------------------------------------------------------------------
>
>                 Key: FLINK-18681
>                 URL: https://issues.apache.org/jira/browse/FLINK-18681
>             Project: Flink
>          Issue Type: Bug
>    Affects Versions: 1.11.0
>            Reporter: wangtaiyang
>            Priority: Major
>         Attachments: appId.log, dependency.log, 
> image-2020-07-28-15-32-51-851.png, 
> yarn-hadoop-resourcemanager-x.x.x.15.log.2020-07-22-17.log
>
>
> When I submit a flink task to yarn, the default resource configuration is 
> 1G&1core, but in fact this task will always increase resources 2core, 3core, 
> and so on. . . 200core. . . Then I went to look at the JM log and found the 
> following error:
> {code:java}
> //代码占位符
> java.lang.NoSuchMethodError: 
> org.apache.commons.cli.Option.builder(Ljava/lang/String;)Lorg/apache/commons/cli/Option$Builder;java.lang.NoSuchMethodError:
>  
> org.apache.commons.cli.Option.builder(Ljava/lang/String;)Lorg/apache/commons/cli/Option$Builder;
>  at 
> org.apache.flink.runtime.entrypoint.parser.CommandLineOptions.<clinit>(CommandLineOptions.java:28)
>  ~[flink-dist_2.11-1.11.1.jar:1.11.1] at 
> org.apache.flink.runtime.clusterframework.BootstrapTools.lambda$getDynamicPropertiesAsString$0(BootstrapTools.java:648)
>  ~[flink-dist_2.11-1.11.1.jar:1.11.1] at 
> java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:267) 
> ~[?:1.8.0_191]
> .......
> java.lang.NoClassDefFoundError: Could not initialize class 
> org.apache.flink.runtime.entrypoint.parser.CommandLineOptionsjava.lang.NoClassDefFoundError:
>  Could not initialize class 
> org.apache.flink.runtime.entrypoint.parser.CommandLineOptions at 
> org.apache.flink.runtime.clusterframework.BootstrapTools.lambda$getDynamicPropertiesAsString$0(BootstrapTools.java:648)
>  ~[flink-dist_2.11-1.11.1.jar:1.11.1] at 
> java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:267) 
> ~[?:1.8.0_191] at 
> java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1553) 
> ~[?:1.8.0_191] at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) 
> ~[?:1.8.0_191]{code}
> Finally, it is confirmed that it is caused by the commands-cli version 
> conflict, but the task reporting error has not stopped and will continue to 
> grab resources and increase. Is this a bug?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-18681) The jar package version conflict causes the task to continue to increase and grab resources

Reply via email to