+1 on the 7 day expiry explanation, This is most likely the cause.
I faced the 7 day expiry issue with a previous version of flink that dint support keytabs, I am currently running flink-1.3 with keytabs (it has been going okay for 2 days now), I will update after the 7 day mark. Thanks, Prabhu On Thu, Aug 17, 2017 at 11:06 AM, Eron Wright <eronwri...@gmail.com> wrote: > Raja, > According to those configuration values, the delegation token would be > automatically renewed every 24 hours, then expire entirely after 7 days. > You say that the job ran without issue for 'a few days'. Can we conclude > that the job hit the 7-day DT expiration? > > Flink supports the use of Kerberos keytabs as an alternative to delegation > tokens for exactly this reason, that delegation tokens eventually expire > and so aren't useful to a long-running program. Consider making use of > keytabs here. > > Hope this helps! > -Eron > > > On Thu, Aug 17, 2017 at 9:58 AM, Ted Yu <yuzhih...@gmail.com> wrote: > >> I think this needs to be done by the admin. >> >> On Thu, Aug 17, 2017 at 9:37 AM, Raja.Aravapalli < >> raja.aravapa...@target.com> wrote: >> >>> >>> >>> I don’t have access to the site.xml files, it is controlled by a support >>> team. >>> >>> >>> >>> Does flink has any configuration settings or api’s thru which we can >>> control this ? >>> >>> >>> >>> >>> >>> Regards, >>> >>> Raja. >>> >>> >>> >>> *From: *Ted Yu <yuzhih...@gmail.com> >>> *Date: *Thursday, August 17, 2017 at 11:07 AM >>> *To: *Raja Aravapalli <raja.aravapa...@target.com> >>> *Cc: *"user@flink.apache.org" <user@flink.apache.org> >>> *Subject: *Re: [EXTERNAL] Re: Fink application failing with kerberos >>> issue after running successfully without any issues for few days >>> >>> >>> >>> Can you try shortening renewal interval to something like 28800000 ? >>> >>> >>> >>> Cheers >>> >>> >>> >>> On Thu, Aug 17, 2017 at 8:58 AM, Raja.Aravapalli < >>> raja.aravapa...@target.com> wrote: >>> >>> Hi Ted, >>> >>> >>> >>> Below is what I see in the environment: >>> >>> >>> >>> dfs.namenode.delegation.token.max-lifetime: *604800000* >>> >>> dfs.namenode.delegation.token.renew-interval: *86400000* >>> >>> >>> >>> >>> >>> Thanks. >>> >>> >>> >>> >>> >>> Regards, >>> >>> Raja. >>> >>> >>> >>> *From: *Ted Yu <yuzhih...@gmail.com> >>> *Date: *Thursday, August 17, 2017 at 10:46 AM >>> *To: *Raja Aravapalli <raja.aravapa...@target.com> >>> *Cc: *"user@flink.apache.org" <user@flink.apache.org> >>> *Subject: *[EXTERNAL] Re: Fink application failing with kerberos issue >>> after running successfully without any issues for few days >>> >>> >>> >>> What are the values for the following parameters ? >>> >>> >>> >>> dfs.namenode.delegation.token.max-lifetime >>> >>> >>> >>> dfs.namenode.delegation.token.renew-interval >>> >>> >>> >>> Cheers >>> >>> >>> >>> On Thu, Aug 17, 2017 at 8:24 AM, Raja.Aravapalli < >>> raja.aravapa...@target.com> wrote: >>> >>> Hi Ted, >>> >>> >>> >>> Find below the configuration I see in yarn-site.xml >>> >>> >>> >>> <property> >>> >>> <name>yarn.resourcemanager.proxy-user-privileges.enabled</name> >>> >>> <value>true</value> >>> >>> </property> >>> >>> >>> >>> >>> >>> Regards, >>> >>> Raja. >>> >>> >>> >>> >>> >>> *From: *Ted Yu <yuzhih...@gmail.com> >>> *Date: *Wednesday, August 16, 2017 at 9:05 PM >>> *To: *Raja Aravapalli <raja.aravapa...@target.com> >>> *Cc: *"user@flink.apache.org" <user@flink.apache.org> >>> *Subject: *[EXTERNAL] Re: hadoop >>> >>> >>> >>> Can you check the following config in yarn-site.xml ? >>> >>> >>> >>> yarn.resourcemanager.proxy-user-privileges.enabled (true) >>> >>> >>> >>> Cheers >>> >>> >>> >>> On Wed, Aug 16, 2017 at 4:48 PM, Raja.Aravapalli < >>> raja.aravapa...@target.com> wrote: >>> >>> >>> >>> Hi, >>> >>> >>> >>> I triggered an flink yarn-session on a running Hadoop cluster… and >>> triggering streaming application on that. >>> >>> >>> >>> But, I see after few days of running without any issues, the flink >>> application which is writing data to hdfs failing with below exception. >>> >>> >>> >>> Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.secu >>> rity.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN >>> token xxxxxx for xxxxxx) can't be found in cache >>> >>> >>> >>> >>> >>> Can someone please help me how I can fix this. Thanks a lot. >>> >>> >>> >>> >>> >>> >>> >>> Regards, >>> >>> Raja. >>> >>> >>> >>> >>> >>> >>> >> >> >