Yongjun Zhang created HDFS-6536: ----------------------------------- Summary: FileSystem.Cache.closeAll() threw an exception due to authentication failure at the end of a webhdfs client session Key: HDFS-6536 URL: https://issues.apache.org/jira/browse/HDFS-6536 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, webhdfs Affects Versions: 2.4.0 Reporter: Yongjun Zhang
With a small client program below, when running as user "root", exception is thrown at the end of the client run. The config is HA security enabled, with client config setting {code} <property> <name>fs.defaultFS</name> <value>webhdfs://ns1</value> </property> {code} The client program: {code} public class kclient1 { public static void main(String[] args) throws IOException { final Configuration conf = new Configuration(); //a non-root user final UserGroupInformation ugi = UserGroupInformation.getUGIFromTicketCache("/tmp/krb5cc_496", "h...@xyz.com"); System.out.println("Starting"); ugi.doAs(new PrivilegedAction<Object>() { @Override public Object run() { try { FileSystem fs = FileSystem.get(conf); String renewer = "abcdefg"; fs.addDelegationTokens( renewer, ugi.getCredentials()); // Just to prove that we connected with right credentials. fs.getFileStatus(new Path("/")); return fs.getDelegationToken(renewer); } catch (Exception e) { e.printStackTrace(); return null; } } }); System.out.println("THE END"); } } {code} Output: {code} [root@yjzc5w-1 tmp2]# hadoop --config /tmp2/conf jar kclient1.jar kclient1.kclient1 Starting 14/06/14 20:38:51 WARN ssl.FileBasedKeyStoresFactory: The property 'ssl.client.truststore.location' has not been set, no TrustStore will be loaded 14/06/14 20:38:52 INFO web.WebHdfsFileSystem: Retrying connect to namenode: yjzc5w-2.xyz.com/172.26.3.87:20101. Already tried 0 time(s); retry policy is org.apache.hadoop.io.retry.RetryPolicies$FailoverOnNetworkExceptionRetry@1a92210, delay 0ms. To prove that connection with right credentials to get file status updated updated 7 THE END 14/06/14 20:38:53 WARN ssl.FileBasedKeyStoresFactory: The property 'ssl.client.truststore.location' has not been set, no TrustStore will be loaded 14/06/14 20:38:53 WARN security.UserGroupInformation: PriviledgedActionException as:root (auth:KERBEROS) cause:java.io.IOException: org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) 14/06/14 20:38:53 INFO fs.FileSystem: FileSystem.Cache.closeAll() threw an exception: java.io.IOException: Authentication failed, url=http://yjzc5w-2.xyz.com:20101/webhdfs/v1/?op=CANCELDELEGATIONTOKEN&user.name=root&token=HAAEaGRmcwRoZGZzAIoBRp2bNByKAUbBp7gcbBQUD6vWmRYJRv03XZj7Jajf8PU8CB8SV0VCSERGUyBkZWxlZ2F0aW9uC2hhLWhkZnM6bnMx [root@yjzc5w-1 tmp2]# {code} We can see the the exception is thrown in the end of the client run. I found that the problem is that at the end of client run, like the C++ destructor is called at the end of object scope, the tokens stored in the filesystem cache is get cancelled with the following all: {code} final class TokenAspect<T extends FileSystem & Renewable> { @InterfaceAudience.Private public static class TokenManager extends TokenRenewer { @Override public void cancel(Token<?> token, Configuration conf) throws IOException { getInstance(token, conf).cancelDelegationToken(token); <== } {code} where getInstance(token, conf) create a FileSystem as user "root", then call cancelDelegationToken to server side. However, server doesn't have "root" credential, so throw this exceptoin. When I run the same program as user "hdfs", then it's fine. I think if we run the call to cancelDelegationToken as the user who created the token intially ("hdfs" in this case), then it should work fine. However, the information of the user who created the token is not available at that point. Hi [~daryn], I wonder if you could give a quick comment, really appreciate it! -- This message was sent by Atlassian JIRA (v6.2#6252)