2015-11-04 10:03:31,905 ERROR [Delegation Token Refresh Thread-0] hdfs.KeyProviderCache (KeyProviderCache.java:createKeyProviderURI(87)) - Could not find uri with key [dfs. encryption.key.provider.uri] to create a keyProvider !!
Could it be related to HDFS-7931 ? On Wed, Nov 4, 2015 at 12:30 PM, Chen Song <chen.song...@gmail.com> wrote: > After a bit more investigation, I found that it could be related to > impersonation on kerberized cluster. > > Our job is started with the following command. > > /usr/lib/spark/bin/spark-submit --master yarn-client --principal [principle] > --keytab [keytab] --proxy-user [proxied_user] ... > > > In application master's log, > > At start up, > > 2015-11-03 16:03:41,602 INFO [main] yarn.AMDelegationTokenRenewer > (Logging.scala:logInfo(59)) - Scheduling login from keytab in 64789744 millis. > > Later on, when the delegation token renewer thread kicks in, it tries to > re-login with the specified principle with new credentials and tries to > write the new credentials into the over to the directory where the current > user's credentials are stored. However, with impersonation, because the > current user is a different user from the principle user, it fails with > permission error. > > 2015-11-04 10:03:31,366 INFO [Delegation Token Refresh Thread-0] > yarn.AMDelegationTokenRenewer (Logging.scala:logInfo(59)) - Attempting to > login to KDC using principal: principal/host@domain > 2015-11-04 10:03:31,665 INFO [Delegation Token Refresh Thread-0] > yarn.AMDelegationTokenRenewer (Logging.scala:logInfo(59)) - Successfully > logged into KDC. > 2015-11-04 10:03:31,702 INFO [Delegation Token Refresh Thread-0] > yarn.YarnSparkHadoopUtil (Logging.scala:logInfo(59)) - getting token for > namenode: > hdfs://hadoop_abc/user/proxied_user/.sparkStaging/application_1443481003186_00000 > 2015-11-04 10:03:31,904 INFO [Delegation Token Refresh Thread-0] > hdfs.DFSClient (DFSClient.java:getDelegationToken(1025)) - Created > HDFS_DELEGATION_TOKEN token 389283 for principal on ha-hdfs:hadoop_abc > 2015-11-04 10:03:31,905 ERROR [Delegation Token Refresh Thread-0] > hdfs.KeyProviderCache (KeyProviderCache.java:createKeyProviderURI(87)) - > Could not find uri with key [dfs.encryption.key.provider.uri] to create a > keyProvider !! > 2015-11-04 10:03:31,944 WARN [Delegation Token Refresh Thread-0] > security.UserGroupInformation (UserGroupInformation.java:doAs(1674)) - > PriviledgedActionException as:proxy-user (auth:SIMPLE) > cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): > Operation category READ is not supported in state standby > 2015-11-04 10:03:31,945 WARN [Delegation Token Refresh Thread-0] ipc.Client > (Client.java:run(675)) - Exception encountered while connecting to the server > : > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): > Operation category READ is not supported in state standby > 2015-11-04 10:03:31,945 WARN [Delegation Token Refresh Thread-0] > security.UserGroupInformation (UserGroupInformation.java:doAs(1674)) - > PriviledgedActionException as:proxy-user (auth:SIMPLE) > cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): > Operation category READ is not supported in state standby > 2015-11-04 10:03:31,963 WARN [Delegation Token Refresh Thread-0] > yarn.YarnSparkHadoopUtil (Logging.scala:logWarning(92)) - Error while > attempting to list files from application staging dir > org.apache.hadoop.security.AccessControlException: Permission denied: > user=principal, access=READ_EXECUTE, > inode="/user/proxy-user/.sparkStaging/application_1443481003186_00000":proxy-user:proxy-user:drwx------ > > > Can someone confirm my understanding is right? The class relevant is > below, > https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/AMDelegationTokenRenewer.scala > > Chen > > On Tue, Nov 3, 2015 at 11:57 AM, Chen Song <chen.song...@gmail.com> wrote: > >> We saw the following error happening in Spark Streaming job. Our job is >> running on YARN with kerberos enabled. >> >> First, warnings below were printed out, I only pasted a few but the >> following was repeated hundred/thousand of times. >> >> 15/11/03 14:43:07 WARN UserGroupInformation: PriviledgedActionException >> as:[kerberos principle] (auth:KERBEROS) >> cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): >> Operation category READ is not supported in state standby >> 15/11/03 14:43:07 WARN Client: Exception encountered while connecting to >> the server : >> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): >> Operation category READ is not supported in state standby >> 15/11/03 14:43:07 WARN UserGroupInformation: PriviledgedActionException >> as:[kerberos principle] (auth:KERBEROS) >> cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): >> Operation category READ is not supported in state standby >> 15/11/03 14:43:07 WARN UserGroupInformation: PriviledgedActionException >> as:[kerberos principle] (auth:KERBEROS) >> cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): >> Operation category READ is not supported in state standby >> 15/11/03 14:43:07 WARN Client: Exception encountered while connecting to >> the server : >> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): >> Operation category READ is not supported in state standby >> >> >> It seems to have something to do with renewal of token and it tried to >> connect a standby namenode. >> >> Then the following error was thrown out. >> >> 15/11/03 14:43:20 ERROR Utils: Uncaught exception in thread Delegation >> Token Refresh Thread-0 >> java.lang.StackOverflowError >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:360) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:411) >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >> at >> org.apache.spark.deploy.yarn.ExecutorDelegationTokenUpdater.updateCredentialsIfRequired(ExecutorDelegationTokenUpdater.scala:89) >> at >> org.apache.spark.deploy.yarn.ExecutorDelegationTokenUpdater$$anon$1$$anonfun$run$1.apply$mcV$sp(ExecutorDelegationTokenUpdater.scala:49) >> at >> org.apache.spark.deploy.yarn.ExecutorDelegationTokenUpdater$$anon$1$$anonfun$run$1.apply(ExecutorDelegationTokenUpdater.scala:49) >> at >> org.apache.spark.deploy.yarn.ExecutorDelegationTokenUpdater$$anon$1$$anonfun$run$1.apply(ExecutorDelegationTokenUpdater.scala:49) >> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699) >> at >> org.apache.spark.deploy.yarn.ExecutorDelegationTokenUpdater$$anon$1.run(ExecutorDelegationTokenUpdater.scala:49) >> at >> org.apache.spark.deploy.yarn.ExecutorDelegationTokenUpdater.updateCredentialsIfRequired(ExecutorDelegationTokenUpdater.scala:79) >> at >> org.apache.spark.deploy.yarn.ExecutorDelegationTokenUpdater$$anon$1$$anonfun$run$1.apply$mcV$sp(ExecutorDelegationTokenUpdater.scala:49) >> at >> org.apache.spark.deploy.yarn.ExecutorDelegationTokenUpdater$$anon$1$$anonfun$run$1.apply(ExecutorDelegationTokenUpdater.scala:49) >> at >> org.apache.spark.deploy.yarn.ExecutorDelegationTokenUpdater$$anon$1$$anonfun$run$1.apply(ExecutorDelegationTokenUpdater.scala:49) >> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699) >> at >> org.apache.spark.deploy.yarn.ExecutorDelegationTokenUpdater$$anon$1.run(ExecutorDelegationTokenUpdater.scala:49) >> at >> org.apache.spark.deploy.yarn.ExecutorDelegationTokenUpdater.updateCredentialsIfRequired(ExecutorDelegationTokenUpdater.scala:79) >> at >> org.apache.spark.deploy.yarn.ExecutorDelegationTokenUpdater$$anon$1$$anonfun$run$1.apply$mcV$sp(ExecutorDelegationTokenUpdater.scala:49) >> at >> org.apache.spark.deploy.yarn.ExecutorDelegationTokenUpdater$$anon$1$$anonfun$run$1.apply(ExecutorDelegationTokenUpdater.scala:49) >> at >> org.apache.spark.deploy.yarn.ExecutorDelegationTokenUpdater$$anon$1$$anonfun$run$1.apply(ExecutorDelegationTokenUpdater.scala:49) >> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699) >> >> >> Again, the above stacktrace was repeated hundreds/throusands of times. >> That explains why a stackoverflow exception was produced. >> >> My question is: >> >> * If the HDFS active name node failed over during the job, the next time >> token renewal is needed, the client would always need to connect with the >> same namenode when the token was created. Is that true and expected? If so, >> how to handle failover of namenodes for a streaming job in Spark. >> >> Thanks for your feedback in advance. >> >> -- >> Chen Song >> >> > > > -- > Chen Song > >