[
https://issues.apache.org/jira/browse/NIFI-5557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16607429#comment-16607429
]
ASF GitHub Bot commented on NIFI-5557:
--------------------------------------
Github user jtstorck commented on a diff in the pull request:
https://github.com/apache/nifi/pull/2971#discussion_r216037639
--- Diff:
nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/PutHDFS.java
---
@@ -269,13 +272,15 @@ public Object run() {
}
changeOwner(context, hdfs, configuredRootDirPath,
flowFile);
} catch (IOException e) {
- if (!Strings.isNullOrEmpty(e.getMessage()) &&
e.getMessage().contains(String.format("Couldn't setup connection for %s",
ugi.getUserName()))) {
- getLogger().error(String.format("An error
occured while connecting to HDFS. Rolling back session, and penalizing flowfile
%s",
-
flowFile.getAttribute(CoreAttributes.UUID.key())));
- session.rollback(true);
- } else {
- throw e;
- }
+ boolean tgtExpired = hasCause(e, GSSException.class,
gsse -> "Failed to find any Kerberos tgt".equals(gsse.getMinorString()));
--- End diff --
@ekovacs After seeing the use of getMinorString here, I looked at
GSSException, and it looks like there's some error codes that could be used to
detect the actual cause, rather than string matching. Are getMajor and
getMinor returning ints when these exceptions happen?
> PutHDFS "GSSException: No valid credentials provided" when krb ticket expires
> -----------------------------------------------------------------------------
>
> Key: NIFI-5557
> URL: https://issues.apache.org/jira/browse/NIFI-5557
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Affects Versions: 1.5.0
> Reporter: Endre Kovacs
> Assignee: Endre Kovacs
> Priority: Major
>
> when using *PutHDFS* processor in a kerberized environment, with a flow
> "traffic" which approximately matches or less frequent then the lifetime of
> the ticket of the principal, we see this in the log:
> {code:java}
> INFO [Timer-Driven Process Thread-4] o.a.h.io.retry.RetryInvocationHandler
> Exception while invoking getFileInfo of class
> ClientNamenodeProtocolTranslatorPB over host2/ip2:8020 after 13 fail over
> attempts. Trying to fail over immediately.
> java.io.IOException: Failed on local exception: java.io.IOException: Couldn't
> setup connection for [email protected] to host2.example.com/ip2:8020;
> Host Details : local host is: "host1.example.com/ip1"; destination host is:
> "host2.example.com":8020;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:776)
> at org.apache.hadoop.ipc.Client.call(Client.java:1479)
> at org.apache.hadoop.ipc.Client.call(Client.java:1412)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> at com.sun.proxy.$Proxy134.getFileInfo(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
> at sun.reflect.GeneratedMethodAccessor344.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy135.getFileInfo(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2108)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
> at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
> at org.apache.nifi.processors.hadoop.PutHDFS$1.run(PutHDFS.java:254)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:360)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1678)
> at org.apache.nifi.processors.hadoop.PutHDFS.onTrigger(PutHDFS.java:222)
> {code}
> and the flowfile is routed to failure relationship.
> *To reproduce:*
> Create a principal in your KDC with two minutes ticket lifetime,
> and set up a similar flow:
> {code:java}
> GetFile => putHDFS ----- success----- -> logAttributes
> \
> fail
> \
> -> logAttributes
> {code}
> copy a file to the input directory of the getFile processor. If the influx
> of the flowfile is much more frequent, then the expiration time of the ticket:
> {code:java}
> watch -n 5 "cp book.txt /path/to/input"
> {code}
> then the flow will successfully run without issue.
> If we adjust this, to:
> {code:java}
> watch -n 121 "cp book.txt /path/to/input"
> {code}
> then we will observe this issue.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)