Hi, In my environment doing the "proxy" thing didn't work. With an token expire of 168 hours (1 week) the job consistently terminates at exactly (within a margin of 10 seconds) 173.5 hours. So far we have not been able to solve this problem.
Our teams now simply assume the thing fails once in a while and have an automatic restart feature (i.e. shell script with a while true loop). The best guess at a root cause is this https://issues.apache.org/jira/browse/HDFS-9276 If you have a real solution or a reference to a related bug report to this problem then please share! Niels Basjes On Thu, Mar 17, 2016 at 10:20 AM, Thomas Lamirault < thomas.lamira...@ericsson.com> wrote: > Hi Max, > > I will try these workaround. > Thanks > > Thomas > > ________________________________________ > De : Maximilian Michels [m...@apache.org] > Envoyé : mardi 15 mars 2016 16:51 > À : user@flink.apache.org > Cc : Niels Basjes > Objet : Re: Flink job on secure Yarn fails after many hours > > Hi Thomas, > > Nils (CC) and I found out that you need at least Hadoop version 2.6.1 > to properly run Kerberos applications on Hadoop clusters. Versions > before that have critical bugs related to the internal security token > handling that may expire the token although it is still valid. > > That said, there is another limitation of Hadoop that the maximum > internal token life time is one week. To work around this limit, you > have two options: > > a) increasing the maximum token life time > > In yarn-site.xml: > > <property> > <name>yarn.resourcemanager.delegation.token.max-lifetime</name> > <value>9223372036854775807</value> > </property> > > In hdfs-site.xml > > <property> > <name>dfs.namenode.delegation.token.max-lifetime</name> > <value>9223372036854775807</value> > </property> > > > b) setup the Yarn ResourceManager as a proxy for the HDFS Namenode: > > From > http://www.cloudera.com/documentation/enterprise/5-3-x/topics/cm_sg_yarn_long_jobs.html > > "You can work around this by configuring the ResourceManager as a > proxy user for the corresponding HDFS NameNode so that the > ResourceManager can request new tokens when the existing ones are past > their maximum lifetime." > > @Nils: Could you comment on what worked best for you? > > Best, > Max > > > On Mon, Mar 14, 2016 at 12:24 PM, Thomas Lamirault > <thomas.lamira...@ericsson.com> wrote: > > > > Hello everyone, > > > > > > > > We are facing the same probleme now in our Flink applications, launch > using YARN. > > > > Just want to know if there is any update about this exception ? > > > > > > > > Thanks > > > > > > > > Thomas > > > > > > > > ________________________________ > > > > De : ni...@basj.es [ni...@basj.es] de la part de Niels Basjes [ > ni...@basjes.nl] > > Envoyé : vendredi 4 décembre 2015 10:40 > > À : user@flink.apache.org > > Objet : Re: Flink job on secure Yarn fails after many hours > > > > Hi Maximilian, > > > > I just downloaded the version from your google drive and used that to > run my test topology that accesses HBase. > > I deliberately started it twice to double the chance to run into this > situation. > > > > I'll keep you posted. > > > > Niels > > > > > > On Thu, Dec 3, 2015 at 11:44 AM, Maximilian Michels <m...@apache.org> > wrote: > >> > >> Hi Niels, > >> > >> Just got back from our CI. The build above would fail with a > >> Checkstyle error. I corrected that. Also I have built the binaries for > >> your Hadoop version 2.6.0. > >> > >> Binaries: > >> > >> > https://github.com/mxm/flink/archive/kerberos-yarn-heartbeat-fail-0.10.1.zip > >> > >> Thanks, > >> Max > >> > >> On Wed, Dec 2, 2015 at 6:52 PM, Maximilian Michels <0.0.0.0:41281 > >> >>>> >> >> > 21:30:28,185 ERROR > org.apache.flink.runtime.jobmanager.JobManager > >> >>>> >> >> > - Actor akka://flink/user/jobmanager#403236912 terminated, > >> >>>> >> >> > stopping > >> >>>> >> >> > process... > >> >>>> >> >> > 21:30:28,286 INFO > >> >>>> >> >> > org.apache.flink.runtime.webmonitor.WebRuntimeMonitor > >> >>>> >> >> > - Removing web root dir > >> >>>> >> >> > /tmp/flink-web-e1a44f94-ea6d-40ee-b87c-e3122d5cb9bd > >> >>>> >> >> > > >> >>>> >> >> > > >> >>>> >> >> > -- > >> >>>> >> >> > Best regards / Met vriendelijke groeten, > >> >>>> >> >> > > >> >>>> >> >> > Niels Basjes > >> >>>> >> > > >> >>>> >> > > >> >>>> >> > > >> >>>> >> > > >> >>>> >> > -- > >> >>>> >> > Best regards / Met vriendelijke groeten, > >> >>>> >> > > >> >>>> >> > Niels Basjes > >> >>>> > > >> >>>> > > >> >>>> > > >> >>>> > > >> >>>> > -- > >> >>>> > Best regards / Met vriendelijke groeten, > >> >>>> > > >> >>>> > Niels Basjes > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> -- > >> >>> Best regards / Met vriendelijke groeten, > >> >>> > >> >>> Niels Basjes > > > > > > > > > > -- > > Best regards / Met vriendelijke groeten, > > > > Niels Basjes > -- Best regards / Met vriendelijke groeten, Niels Basjes