[ https://issues.apache.org/jira/browse/FLINK-18681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17167590#comment-17167590 ]
Xintong Song commented on FLINK-18681: -------------------------------------- [~Tao Yang], The containers are allocated but never started. For each allocated container, there was an error in the AM before starting the container, and AM will release the not-started container and request a new one. Basically the AM keeps the loop: request -> on allocated -> release without start -> request. The problem is that, all the requests succeeded, but all the releases are ignored. As a result, the user observes a continuous resource increasing for this application. According to this observation, it seems the container remains allocated for quite a while, until timeout for not being started. Any insight what may cause the container not being recognized when releasing a just-allocated, non-started container? > The jar package version conflict causes the task to continue to increase and > grab resources > ------------------------------------------------------------------------------------------- > > Key: FLINK-18681 > URL: https://issues.apache.org/jira/browse/FLINK-18681 > Project: Flink > Issue Type: Bug > Affects Versions: 1.11.0 > Reporter: wangtaiyang > Priority: Major > Attachments: appId.log, dependency.log, > image-2020-07-28-15-32-51-851.png, > yarn-hadoop-resourcemanager-x.x.x.15.log.2020-07-22-17.log > > > When I submit a flink task to yarn, the default resource configuration is > 1G&1core, but in fact this task will always increase resources 2core, 3core, > and so on. . . 200core. . . Then I went to look at the JM log and found the > following error: > {code:java} > //代码占位符 > java.lang.NoSuchMethodError: > org.apache.commons.cli.Option.builder(Ljava/lang/String;)Lorg/apache/commons/cli/Option$Builder;java.lang.NoSuchMethodError: > > org.apache.commons.cli.Option.builder(Ljava/lang/String;)Lorg/apache/commons/cli/Option$Builder; > at > org.apache.flink.runtime.entrypoint.parser.CommandLineOptions.<clinit>(CommandLineOptions.java:28) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] at > org.apache.flink.runtime.clusterframework.BootstrapTools.lambda$getDynamicPropertiesAsString$0(BootstrapTools.java:648) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] at > java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:267) > ~[?:1.8.0_191] > ....... > java.lang.NoClassDefFoundError: Could not initialize class > org.apache.flink.runtime.entrypoint.parser.CommandLineOptionsjava.lang.NoClassDefFoundError: > Could not initialize class > org.apache.flink.runtime.entrypoint.parser.CommandLineOptions at > org.apache.flink.runtime.clusterframework.BootstrapTools.lambda$getDynamicPropertiesAsString$0(BootstrapTools.java:648) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] at > java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:267) > ~[?:1.8.0_191] at > java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1553) > ~[?:1.8.0_191] at > java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > ~[?:1.8.0_191]{code} > Finally, it is confirmed that it is caused by the commands-cli version > conflict, but the task reporting error has not stopped and will continue to > grab resources and increase. Is this a bug? -- This message was sent by Atlassian Jira (v8.3.4#803005)