ruiliang created FLINK-39274:
--------------------------------

             Summary: TM It is impossible to bypass the KDC login process, yet 
the TOKEN issued by AM has not been actually utilized.
                 Key: FLINK-39274
                 URL: https://issues.apache.org/jira/browse/FLINK-39274
             Project: Flink
          Issue Type: Bug
    Affects Versions: 1.17.2
         Environment: flink on yarn
            Reporter: ruiliang


>From the document, it can be seen that the allocation did not distinguish 
>between AM and TM.
flink-conf.yaml
{code:java}
security.kerberos.login.keytab=xx.keytab
security.kerberos.login.principal=xx_principal{code}
launch_container.sh 
{code:java}
# It is clearly evident here that AM has successfully issued the TOKEN.
export 
HADOOP_TOKEN_FILE_LOCATION="/data2/hadoop/yarn/local/usercache/hiidoagent/appcache/application_1773803886076_15646/container_e268_1773803886076_15646_01_000003/container_tokens"
..
# But keytab files will still be downloaded here.
export 
_REMOTE_KEYTAB_PATH="hdfs://xx/user/hiidoagent/.flink/application_1773803886076_15646/hiidoagent.keytab"
export HADOOP_USER_NAME="[email protected]"
export _LOCAL_KEYTAB_PATH="krb5.keytab"
export _KEYTAB_PRINCIPAL="hiidoagent"{code}

TM log
{code:java}
2026-03-18 17:49:23,394 INFO  
org.apache.flink.runtime.state.changelog.StateChangelogStorageLoader [] - 
StateChangelogStorageLoader initialized with shortcut names {memory,filesystem}.
2026-03-18 17:49:23,441 INFO  
org.apache.flink.runtime.security.token.hadoop.KerberosLoginProvider [] - 
Attempting to login to KDC using principal: hiidoagent keytab: 
/data2/hadoop/yarn/local/usercache/hiidoagent/appcache/application_1773803886076_15646/container_e268_1773803886076_15646_01_000003/krb5.keytab
2026-03-18 17:49:23,717 INFO  org.apache.hadoop.security.UserGroupInformation   
           [] - Login successful for user hiidoagent using keytab file 
/data2/hadoop/yarn/local/usercache/hiidoagent/appcache/application_1773803886076_15646/container_e268_1773803886076_15646_01_000003/krb5.keytab
2026-03-18 17:49:23,717 INFO  
org.apache.flink.runtime.security.token.hadoop.KerberosLoginProvider [] - 
Successfully logged into KDC
2026-03-18 17:49:23,719 INFO  
org.apache.flink.runtime.security.modules.HadoopModule       [] - Starting TGT 
renewal task
2026-03-18 17:49:23,719 INFO  
org.apache.flink.runtime.security.modules.HadoopModule       [] - TGT renewal 
task started and reoccur in 60000 ms
2026-03-18 17:49:23,719 INFO  
org.apache.flink.runtime.security.modules.HadoopModule       [] - Hadoop user 
set to [email protected] (auth:KERBEROS)
2026-03-18 17:49:23,720 INFO  
org.apache.flink.runtime.security.modules.HadoopModule       [] - Kerberos 
security is enabled.
2026-03-18 17:49:23,720 INFO  
org.apache.flink.runtime.security.modules.HadoopModule       [] - Kerberos 
credentials are valid.
2026-03-18 17:49:23,726 INFO  
org.apache.flink.runtime.security.modules.JaasModule         [] - Jaas file 
will be created as 
/data1/hadoop/yarn/local/usercache/hiidoagent/appcache/application_1773803886076_15646/jaas-7581660068545285667.conf.
...
2026-03-18 17:49:25,228 INFO  
org.apache.flink.runtime.externalresource.ExternalResourceUtils [] - Enabled 
external resources: []
2026-03-18 17:49:25,229 INFO  
org.apache.flink.runtime.security.token.DelegationTokenReceiverRepository [] - 
Loading delegation token receivers
2026-03-18 17:49:25,232 INFO  
org.apache.flink.runtime.security.token.DelegationTokenReceiverRepository [] - 
Delegation token receiver hadoopfs loaded and initialized
2026-03-18 17:49:25,233 INFO  
org.apache.flink.runtime.security.token.DelegationTokenReceiverRepository [] - 
Delegation token receiver hbase loaded and initialized {code}
代码:
[https://github.com/apache/flink/blob/6fc5c97ec3a89975ee44b1b084efc8fbc25c73ee/flink-yarn/src/main/java/org/apache/flink/yarn/YarnTaskExecutorRunner.java#L132]
Looking at the source code, there is no configuration or judgment logic in the 
code. Here, we should configure controllability instead of writing it 
completely in a fixed manner.

KDC
The concurrent volume of KDC = number of Flink apps * total number of 
containers.
If it involves a large number of short-term Flink tasks, this will be a fatal 
pressure on KDC. KDC will become severely sluggish and affect the overall 
security and stability of the cluster.



 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to