This seems like your Kerberos server is starting to issue invalid token to your job manager. Can you share how your Kerberos setting is configured? This might also relate to how your KDC servers are configured.
-- Rong On Fri, Aug 23, 2019 at 7:00 AM Zhu Zhu <reed...@gmail.com> wrote: > Hi Juan, > > Have you tried Flink release built with Hadoop 2.7 or later version? > If you are using Flink 1.8/1.9, it should be Pre-bundled Hadoop 2.7+ jar > which can be found in the Flink download page. > > I think YARN-3103 is about AMRMClientImp.class and it is in the flink > shaded hadoop jar. > > Thanks, > Zhu Zhu > > Juan Gentile <j.gent...@criteo.com> 于2019年8月23日周五 下午7:48写道: > >> Hello! >> >> >> >> We are running Flink on Yarn and we are currently getting the following >> error: >> >> >> >> *2019-08-23 06:11:01,534 WARN >> org.apache.hadoop.security.UserGroupInformation - >> PriviledgedActionException as:XXXX (auth:KERBEROS) >> cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): >> Invalid AMRMToken from appattempt_1564713228886_5299648_000001* >> >> *2019-08-23 06:11:01,535 WARN >> org.apache.hadoop.ipc.Client - Exception >> encountered while connecting to the server : >> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): >> Invalid AMRMToken from appattempt_1564713228886_5299648_000001* >> >> *2019-08-23 06:11:01,536 WARN >> org.apache.hadoop.security.UserGroupInformation - >> PriviledgedActionException as: XXXX (auth:KERBEROS) >> cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): >> Invalid AMRMToken from appattempt_1564713228886_5299648_000001* >> >> *2019-08-23 06:11:01,581 WARN >> org.apache.hadoop.io.retry.RetryInvocationHandler - Exception >> while invoking ApplicationMasterProtocolPBClientImpl.allocate over rm0. Not >> retrying because Invalid or Cancelled Token* >> >> *org.apache.hadoop.security.token.SecretManager$InvalidToken: Invalid >> AMRMToken from appattempt_1564713228886_5299648_000001* >> >> * at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native >> Method)* >> >> * at >> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)* >> >> * at >> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)* >> >> * at java.lang.reflect.Constructor.newInstance(Constructor.java:423)* >> >> * at >> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)* >> >> * at >> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)* >> >> * at >> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)* >> >> * at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)* >> >> * at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)* >> >> * at java.lang.reflect.Method.invoke(Method.java:498)* >> >> * at >> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:288)* >> >> * at >> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:206)* >> >> * at >> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:188)* >> >> * at com.sun.proxy.$Proxy26.allocate(Unknown Source)* >> >> * at >> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:277)* >> >> * at >> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:224)* >> >> *Caused by: >> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): >> Invalid AMRMToken from appattempt_1564713228886_5299648_000001* >> >> * at org.apache.hadoop.ipc.Client.call(Client.java:1472)* >> >> * at org.apache.hadoop.ipc.Client.call(Client.java:1409)* >> >> * at >> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:231)* >> >> * at com.sun.proxy.$Proxy25.allocate(Unknown Source)* >> >> * at >> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)* >> >> * ... 9 more* >> >> >> >> The Flink cluster runs ok for a while but then after a day we get this >> error again. We haven’t made changes to our code so that’s why it’s hard to >> understand why all of a sudden we started to see this. >> >> >> >> We found this issue reported on Yarn >> https://issues.apache.org/jira/browse/YARN-3103 but our version of Yarn >> already has that fix. >> >> >> >> Any help will be appreciated. >> >> >> >> Thank you, >> >> Juan >> >