Hi,

As a general suggestion please use the official releases since we're not
able to analyze any kind of custom code with cherry-picks with potential
hand made conflict resolution.

When you state "needs to be restarted due to an exception" then what kind
of exception you're are referring to?
I mean what kind of file operation is happening? Full stack trace can be
useful too...

The reason why I'm asking is because there are features which are planned
not working, like YARN log aggregation
but Flink data processing must work after the TM registered itself at the
JM. When the mentioned registration happens then
the TM receives a set of fresh tokens which must be used for data
processing.

BR,
G


> From: dpp <pengpeng.d...@foxmail.com>
> Date: Sat, Aug 10, 2024 at 6:42 AM
> Subject: Flink jobs failed in Kerberos delegation token
> To: user <user@flink.apache.org>
>
>
> Hello, I am currently using Flink version 1.15.2 and have encountered an
> issue with the HDFS delegation token expiring after 7 days in a Kerberos
> scenario.
> I have seen a new delegation token framework (
> https://issues.apache.org/jira/browse/FLINK-21232)  and I have merged the
> code commits from 1 to 12 (Sub-Tasks 1-12) in the link into my Flink
> version 1.15.2.
> Now, it is possible to refresh the delegation token periodically. However,
> after 7 days, if the JobManager or TaskManager needs to be restarted due to
> an exception, I found that the Yarn container used to start JM/TM still
> uses the HDFS_DELEGATION_KIND that was generated the first time the job was
> submitted.And it also reports an error similar to 'token
> (HDFS_DELEGATION_TOKEN token 31615466 for xx) can't be found in cache'.
> So,the new delegation token framework did not take effect. I'm using the
> default method of Flink and delegation tokens are not managed elsewhere.
> Could anyone help me with this issue? Thank you very much.
>

Reply via email to