Github user EronWright commented on the issue: https://github.com/apache/flink/pull/3776 @Rucongzhang thanks for the contribution. I think I understand the problem and your solution, which I will recap. I also found YARN-2704 to be useful background. *Problem*: 1. YARN log aggregation depends on an HDFS delegation token, which it obtains from container token storage not from the UGI. In keytab mode, the Flink client doesn't upload any delegation tokens, causing log aggregation to fail. 2. The Flink cluster doesn't renew delegation tokens. Note: Flink does renew _Kerberos tickets_ using the keytab. 3. When the UGI contains both a delegation token and a Kerberos ticket, the delegation token is preferred. After expiration, Flink does not fallback to using the ticket. *Solution*: 1. Change Flink client to upload delegation tokens. Addresses problem 1. 2 Change Flink cluster to filter out the HDFS delegation token from the tokens loaded from storage when populating the UGI. Addresses problem 3. 3 Change JM to propagate its stored tokens to the TM, rather than the tokens from the UGI (which were filtered in (2).
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---