[ https://issues.apache.org/jira/browse/FLINK-12728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
wgcn updated FLINK-12728: ------------------------- Description: job can't restart when flink job has been running for a long time and then taskmanager restarting ,i find log in AM that AM request containers taskmanager all the time . the log in NodeManager show that the new requested containers can't downloading file from hdfs because of kerberos . I configed the keytab config that security.kerberos.login.use-ticket-cache: false security.kerberos.login.keytab: /data/sysdir/knit/user/.flink.keytab security.kerberos.login.principal: [flink/client-docker-201-53.hadoop.lq@HADOOP.LQ2. |mailto:flink/client-docker-201-53.hadoop.lq@HADOOP.LQ2.] at flink-client machine and keytab is exist. I showed the logs at AM and NodeManager below. was: job can't restart when flink job has been running for a long time and then taskmanager restarting ,i find log in AM that AM request containers taskmanager all the time . log in NodeManager show that the new requested containers can't downloading file from hdfs because of kerberos . I configed the keytab config that security.kerberos.login.use-ticket-cache: false security.kerberos.login.keytab: /data/sysdir/knit/user/.flink.keytab security.kerberos.login.principal: [flink/client-docker-201-53.hadoop.lq@HADOOP.LQ2. |mailto:flink/client-docker-201-53.hadoop.lq@HADOOP.LQ2.] at flink-client machine and keytab is exist. I showed the logs at AM and NodeManager below. > taskmanager container can't launch on nodemanager machine because of > kerberos > ----------------------------------------------------------------------------------- > > Key: FLINK-12728 > URL: https://issues.apache.org/jira/browse/FLINK-12728 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN > Affects Versions: 1.7.2 > Environment: linux > jdk8 > hadoop 2.7.2 > flink 1.7.2 > Reporter: wgcn > Priority: Major > Attachments: AM.log, NM.log > > > job can't restart when flink job has been running for a long time and > then taskmanager restarting ,i find log in AM that AM request > containers taskmanager all the time . the log in NodeManager show > that the new requested containers can't downloading file from hdfs because > of kerberos . I configed the keytab config that > security.kerberos.login.use-ticket-cache: false > security.kerberos.login.keytab: /data/sysdir/knit/user/.flink.keytab > security.kerberos.login.principal: > [flink/client-docker-201-53.hadoop.lq@HADOOP.LQ2. > |mailto:flink/client-docker-201-53.hadoop.lq@HADOOP.LQ2.] > at flink-client machine and keytab is exist. > I showed the logs at AM and NodeManager below. > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)