Hi, No, this issue is now gone for us. The fixed in 1.2.0 ensured that we are now able to run jobs on our cluster beyond the 7 days limit.
Niels On Wed, Apr 12, 2017 at 5:35 PM, Robert Metzger <rmetz...@apache.org> wrote: > Niels, are you still facing this issue? > > As far as I understood it, the security changes in Flink 1.2.0 use a new > Kerberos mechanism that allows infinite token renewal. > > On Thu, Mar 17, 2016 at 7:30 AM, Maximilian Michels <m...@apache.org> > wrote: > >> Hi Niels, >> >> Thanks for the feedback. As far as I know, Hadoop deliberately >> defaults to the one week maximum life time of delegation tokens. Have >> you tried increasing the maximum token life time or was that not an >> option? >> >> I wonder why do you use a while loop? Would it be possible to use the >> Yarn failover mechanism which starts a new ApplicationMaster and >> resubmits the job? >> >> Thanks, >> Max >> >> >> On Thu, Mar 17, 2016 at 12:43 PM, Niels Basjes <ni...@basjes.nl> wrote: >> > Hi, >> > >> > In my environment doing the "proxy" thing didn't work. >> > With an token expire of 168 hours (1 week) the job consistently >> terminates >> > at exactly (within a margin of 10 seconds) 173.5 hours. >> > So far we have not been able to solve this problem. >> > >> > Our teams now simply assume the thing fails once in a while and have an >> > automatic restart feature (i.e. shell script with a while true loop). >> > The best guess at a root cause is this >> > https://issues.apache.org/jira/browse/HDFS-9276 >> > >> > If you have a real solution or a reference to a related bug report to >> this >> > problem then please share! >> > >> > Niels Basjes >> > >> > >> > >> > On Thu, Mar 17, 2016 at 10:20 AM, Thomas Lamirault >> > <thomas.lamira...@ericsson.com> wrote: >> >> >> >> Hi Max, >> >> >> >> I will try these workaround. >> >> Thanks >> >> >> >> Thomas >> >> >> >> ________________________________________ >> >> De : Maximilian Michels [m...@apache.org] >> >> Envoyé : mardi 15 mars 2016 16:51 >> >> À : user@flink.apache.org >> >> Cc : Niels Basjes >> >> Objet : Re: Flink job on secure Yarn fails after many hours >> >> >> >> Hi Thomas, >> >> >> >> Nils (CC) and I found out that you need at least Hadoop version 2.6.1 >> >> to properly run Kerberos applications on Hadoop clusters. Versions >> >> before that have critical bugs related to the internal security token >> >> handling that may expire the token although it is still valid. >> >> >> >> That said, there is another limitation of Hadoop that the maximum >> >> internal token life time is one week. To work around this limit, you >> >> have two options: >> >> >> >> a) increasing the maximum token life time >> >> >> >> In yarn-site.xml: >> >> >> >> <property> >> >> <name>yarn.resourcemanager.delegation.token.max-lifetime</name> >> >> <value>9223372036854775807</value> >> >> </property> >> >> >> >> In hdfs-site.xml >> >> >> >> <property> >> >> <name>dfs.namenode.delegation.token.max-lifetime</name> >> >> <value>9223372036854775807</value> >> >> </property> >> >> >> >> >> >> b) setup the Yarn ResourceManager as a proxy for the HDFS Namenode: >> >> >> >> From >> >> http://www.cloudera.com/documentation/enterprise/5-3-x/ >> topics/cm_sg_yarn_long_jobs.html >> >> >> >> "You can work around this by configuring the ResourceManager as a >> >> proxy user for the corresponding HDFS NameNode so that the >> >> ResourceManager can request new tokens when the existing ones are past >> >> their maximum lifetime." >> >> >> >> @Nils: Could you comment on what worked best for you? >> >> >> >> Best, >> >> Max >> >> >> >> >> >> On Mon, Mar 14, 2016 at 12:24 PM, Thomas Lamirault >> >> <thomas.lamira...@ericsson.com> wrote: >> >> > >> >> > Hello everyone, >> >> > >> >> > >> >> > >> >> > We are facing the same probleme now in our Flink applications, launch >> >> > using YARN. >> >> > >> >> > Just want to know if there is any update about this exception ? >> >> > >> >> > >> >> > >> >> > Thanks >> >> > >> >> > >> >> > >> >> > Thomas >> >> > >> >> > >> >> > >> >> > ________________________________ >> >> > >> >> > De : ni...@basj.es [ni...@basj.es] de la part de Niels Basjes >> >> > [ni...@basjes.nl] >> >> > Envoyé : vendredi 4 décembre 2015 10:40 >> >> > À : user@flink.apache.org >> >> > Objet : Re: Flink job on secure Yarn fails after many hours >> >> > >> >> > Hi Maximilian, >> >> > >> >> > I just downloaded the version from your google drive and used that to >> >> > run my test topology that accesses HBase. >> >> > I deliberately started it twice to double the chance to run into this >> >> > situation. >> >> > >> >> > I'll keep you posted. >> >> > >> >> > Niels >> >> > >> >> > >> >> > On Thu, Dec 3, 2015 at 11:44 AM, Maximilian Michels <m...@apache.org> >> >> > wrote: >> >> >> >> >> >> Hi Niels, >> >> >> >> >> >> Just got back from our CI. The build above would fail with a >> >> >> Checkstyle error. I corrected that. Also I have built the binaries >> for >> >> >> your Hadoop version 2.6.0. >> >> >> >> >> >> Binaries: >> >> >> >> >> >> >> >> >> https://github.com/mxm/flink/archive/kerberos-yarn-heartbeat >> -fail-0.10.1.zip >> >> >> >> >> >> Thanks, >> >> >> Max >> >> >> >> >> >> On Wed, Dec 2, 2015 at 6:52 PM, Maximilian Michels <0.0.0.0:41281 >> >> >> >>>> >> >> > 21:30:28,185 ERROR >> >> >> >>>> >> >> > org.apache.flink.runtime.jobmanager.JobManager >> >> >> >>>> >> >> > - Actor akka://flink/user/jobmanager#403236912 >> terminated, >> >> >> >>>> >> >> > stopping >> >> >> >>>> >> >> > process... >> >> >> >>>> >> >> > 21:30:28,286 INFO >> >> >> >>>> >> >> > org.apache.flink.runtime.webmonitor.WebRuntimeMonitor >> >> >> >>>> >> >> > - Removing web root dir >> >> >> >>>> >> >> > /tmp/flink-web-e1a44f94-ea6d-40ee-b87c-e3122d5cb9bd >> >> >> >>>> >> >> > >> >> >> >>>> >> >> > >> >> >> >>>> >> >> > -- >> >> >> >>>> >> >> > Best regards / Met vriendelijke groeten, >> >> >> >>>> >> >> > >> >> >> >>>> >> >> > Niels Basjes >> >> >> >>>> >> > >> >> >> >>>> >> > >> >> >> >>>> >> > >> >> >> >>>> >> > >> >> >> >>>> >> > -- >> >> >> >>>> >> > Best regards / Met vriendelijke groeten, >> >> >> >>>> >> > >> >> >> >>>> >> > Niels Basjes >> >> >> >>>> > >> >> >> >>>> > >> >> >> >>>> > >> >> >> >>>> > >> >> >> >>>> > -- >> >> >> >>>> > Best regards / Met vriendelijke groeten, >> >> >> >>>> > >> >> >> >>>> > Niels Basjes >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> -- >> >> >> >>> Best regards / Met vriendelijke groeten, >> >> >> >>> >> >> >> >>> Niels Basjes >> >> > >> >> > >> >> > >> >> > >> >> > -- >> >> > Best regards / Met vriendelijke groeten, >> >> > >> >> > Niels Basjes >> > >> > >> > >> > >> > -- >> > Best regards / Met vriendelijke groeten, >> > >> > Niels Basjes >> > > -- Best regards / Met vriendelijke groeten, Niels Basjes