[ 
https://issues.apache.org/jira/browse/FLINK-19022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tartarus updated FLINK-19022:
-----------------------------
    Description: 
My job appeared that JM could not start normally, and the JM container was 
finally killed by RM.

In the end, I found through debug that AkkaRpcActor failed to start because the 
version of yarn in my job was incompatible with the version in the cluster.

[AkkaRpcActor exception 
handling|https://github.com/apache/flink/blob/478c9657fe1240acdc1eb08ad32ea93e08b0cd5e/flink-runtime/src/main/java/org/apache/flink/runtime/rpc/akka/AkkaRpcActor.java#L550]

I add log printing here,and then found the specific problem.
{code:java}
2020-08-21 21:31:16,985 ERROR 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState 
[flink-akka.actor.default-dispatcher-4]  - Could not start RpcEndpoint 
resourcemanager.
java.lang.NoSuchMethodError: 
org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.registerApplicationMaster(Lcom/google/protobuf/RpcController;Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterRequestProto;)Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterResponseProto;
        at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at com.sun.proxy.$Proxy25.registerApplicationMaster(Unknown Source)
        at 
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:222)
        at 
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:214)
        at 
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138)
        at 
org.apache.flink.yarn.YarnResourceManager.createAndStartResourceManagerClient(YarnResourceManager.java:229)
        at 
org.apache.flink.yarn.YarnResourceManager.initialize(YarnResourceManager.java:262)
        at 
org.apache.flink.runtime.resourcemanager.ResourceManager.startResourceManagerServices(ResourceManager.java:204)
        at 
org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:192)
        at 
org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:185)
        at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:544)
        at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:169)
        at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
        at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
        at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
        at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
        at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
        at akka.actor.ActorCell.invoke(ActorCell.scala:561)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
        at akka.dispatch.Mailbox.run(Mailbox.scala:225)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
        at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at 
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at 
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107){code}
Should we add logs here to help find problems?

 
 [ |http://dict.youdao.com/search?q=http&keyfrom=chrome.extension]

  was:
My task appeared that JM could not start normally, and the JM container was 
finally killed by RM.

In the end, I found through debug that AkkaRpcActor failed to start because the 
version of yarn in my job was incompatible with the version in the cluster.

[AkkaRpcActor exception 
handling|https://github.com/apache/flink/blob/478c9657fe1240acdc1eb08ad32ea93e08b0cd5e/flink-runtime/src/main/java/org/apache/flink/runtime/rpc/akka/AkkaRpcActor.java#L550]

I add log printing here,and then found the specific problem.
{code:java}
2020-08-21 21:31:16,985 ERROR 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState 
[flink-akka.actor.default-dispatcher-4]  - Could not start RpcEndpoint 
resourcemanager.
java.lang.NoSuchMethodError: 
org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.registerApplicationMaster(Lcom/google/protobuf/RpcController;Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterRequestProto;)Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterResponseProto;
        at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at com.sun.proxy.$Proxy25.registerApplicationMaster(Unknown Source)
        at 
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:222)
        at 
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:214)
        at 
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138)
        at 
org.apache.flink.yarn.YarnResourceManager.createAndStartResourceManagerClient(YarnResourceManager.java:229)
        at 
org.apache.flink.yarn.YarnResourceManager.initialize(YarnResourceManager.java:262)
        at 
org.apache.flink.runtime.resourcemanager.ResourceManager.startResourceManagerServices(ResourceManager.java:204)
        at 
org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:192)
        at 
org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:185)
        at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:544)
        at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:169)
        at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
        at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
        at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
        at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
        at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
        at akka.actor.ActorCell.invoke(ActorCell.scala:561)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
        at akka.dispatch.Mailbox.run(Mailbox.scala:225)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
        at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at 
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at 
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107){code}
Should we add logs here to help find problems?

 
[ |http://dict.youdao.com/search?q=http&keyfrom=chrome.extension]


> AkkaRpcActor failed to start but no exception information
> ---------------------------------------------------------
>
>                 Key: FLINK-19022
>                 URL: https://issues.apache.org/jira/browse/FLINK-19022
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.10.0
>            Reporter: tartarus
>            Priority: Major
>
> My job appeared that JM could not start normally, and the JM container was 
> finally killed by RM.
> In the end, I found through debug that AkkaRpcActor failed to start because 
> the version of yarn in my job was incompatible with the version in the 
> cluster.
> [AkkaRpcActor exception 
> handling|https://github.com/apache/flink/blob/478c9657fe1240acdc1eb08ad32ea93e08b0cd5e/flink-runtime/src/main/java/org/apache/flink/runtime/rpc/akka/AkkaRpcActor.java#L550]
> I add log printing here,and then found the specific problem.
> {code:java}
> 2020-08-21 21:31:16,985 ERROR 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState 
> [flink-akka.actor.default-dispatcher-4]  - Could not start RpcEndpoint 
> resourcemanager.
> java.lang.NoSuchMethodError: 
> org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.registerApplicationMaster(Lcom/google/protobuf/RpcController;Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterRequestProto;)Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterResponseProto;
>       at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>       at com.sun.proxy.$Proxy25.registerApplicationMaster(Unknown Source)
>       at 
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:222)
>       at 
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:214)
>       at 
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138)
>       at 
> org.apache.flink.yarn.YarnResourceManager.createAndStartResourceManagerClient(YarnResourceManager.java:229)
>       at 
> org.apache.flink.yarn.YarnResourceManager.initialize(YarnResourceManager.java:262)
>       at 
> org.apache.flink.runtime.resourcemanager.ResourceManager.startResourceManagerServices(ResourceManager.java:204)
>       at 
> org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:192)
>       at 
> org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:185)
>       at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:544)
>       at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:169)
>       at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
>       at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
>       at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>       at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
>       at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
>       at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>       at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
>       at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
>       at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
>       at akka.actor.ActorCell.invoke(ActorCell.scala:561)
>       at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
>       at akka.dispatch.Mailbox.run(Mailbox.scala:225)
>       at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
>       at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>       at 
> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>       at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>       at 
> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107){code}
> Should we add logs here to help find problems?
>  
>  [ |http://dict.youdao.com/search?q=http&keyfrom=chrome.extension]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to