Niels, are you still facing this issue? As far as I understood it, the security changes in Flink 1.2.0 use a new Kerberos mechanism that allows infinite token renewal.
On Thu, Mar 17, 2016 at 7:30 AM, Maximilian Michels <m...@apache.org> wrote: > Hi Niels, > > Thanks for the feedback. As far as I know, Hadoop deliberately > defaults to the one week maximum life time of delegation tokens. Have > you tried increasing the maximum token life time or was that not an > option? > > I wonder why do you use a while loop? Would it be possible to use the > Yarn failover mechanism which starts a new ApplicationMaster and > resubmits the job? > > Thanks, > Max > > > On Thu, Mar 17, 2016 at 12:43 PM, Niels Basjes <ni...@basjes.nl> wrote: > > Hi, > > > > In my environment doing the "proxy" thing didn't work. > > With an token expire of 168 hours (1 week) the job consistently > terminates > > at exactly (within a margin of 10 seconds) 173.5 hours. > > So far we have not been able to solve this problem. > > > > Our teams now simply assume the thing fails once in a while and have an > > automatic restart feature (i.e. shell script with a while true loop). > > The best guess at a root cause is this > > https://issues.apache.org/jira/browse/HDFS-9276 > > > > If you have a real solution or a reference to a related bug report to > this > > problem then please share! > > > > Niels Basjes > > > > > > > > On Thu, Mar 17, 2016 at 10:20 AM, Thomas Lamirault > > <thomas.lamira...@ericsson.com> wrote: > >> > >> Hi Max, > >> > >> I will try these workaround. > >> Thanks > >> > >> Thomas > >> > >> ________________________________________ > >> De : Maximilian Michels [m...@apache.org] > >> Envoyé : mardi 15 mars 2016 16:51 > >> À : user@flink.apache.org > >> Cc : Niels Basjes > >> Objet : Re: Flink job on secure Yarn fails after many hours > >> > >> Hi Thomas, > >> > >> Nils (CC) and I found out that you need at least Hadoop version 2.6.1 > >> to properly run Kerberos applications on Hadoop clusters. Versions > >> before that have critical bugs related to the internal security token > >> handling that may expire the token although it is still valid. > >> > >> That said, there is another limitation of Hadoop that the maximum > >> internal token life time is one week. To work around this limit, you > >> have two options: > >> > >> a) increasing the maximum token life time > >> > >> In yarn-site.xml: > >> > >> <property> > >> <name>yarn.resourcemanager.delegation.token.max-lifetime</name> > >> <value>9223372036854775807</value> > >> </property> > >> > >> In hdfs-site.xml > >> > >> <property> > >> <name>dfs.namenode.delegation.token.max-lifetime</name> > >> <value>9223372036854775807</value> > >> </property> > >> > >> > >> b) setup the Yarn ResourceManager as a proxy for the HDFS Namenode: > >> > >> From > >> http://www.cloudera.com/documentation/enterprise/5-3- > x/topics/cm_sg_yarn_long_jobs.html > >> > >> "You can work around this by configuring the ResourceManager as a > >> proxy user for the corresponding HDFS NameNode so that the > >> ResourceManager can request new tokens when the existing ones are past > >> their maximum lifetime." > >> > >> @Nils: Could you comment on what worked best for you? > >> > >> Best, > >> Max > >> > >> > >> On Mon, Mar 14, 2016 at 12:24 PM, Thomas Lamirault > >> <thomas.lamira...@ericsson.com> wrote: > >> > > >> > Hello everyone, > >> > > >> > > >> > > >> > We are facing the same probleme now in our Flink applications, launch > >> > using YARN. > >> > > >> > Just want to know if there is any update about this exception ? > >> > > >> > > >> > > >> > Thanks > >> > > >> > > >> > > >> > Thomas > >> > > >> > > >> > > >> > ________________________________ > >> > > >> > De : ni...@basj.es [ni...@basj.es] de la part de Niels Basjes > >> > [ni...@basjes.nl] > >> > Envoyé : vendredi 4 décembre 2015 10:40 > >> > À : user@flink.apache.org > >> > Objet : Re: Flink job on secure Yarn fails after many hours > >> > > >> > Hi Maximilian, > >> > > >> > I just downloaded the version from your google drive and used that to > >> > run my test topology that accesses HBase. > >> > I deliberately started it twice to double the chance to run into this > >> > situation. > >> > > >> > I'll keep you posted. > >> > > >> > Niels > >> > > >> > > >> > On Thu, Dec 3, 2015 at 11:44 AM, Maximilian Michels <m...@apache.org> > >> > wrote: > >> >> > >> >> Hi Niels, > >> >> > >> >> Just got back from our CI. The build above would fail with a > >> >> Checkstyle error. I corrected that. Also I have built the binaries > for > >> >> your Hadoop version 2.6.0. > >> >> > >> >> Binaries: > >> >> > >> >> > >> >> https://github.com/mxm/flink/archive/kerberos-yarn- > heartbeat-fail-0.10.1.zip > >> >> > >> >> Thanks, > >> >> Max > >> >> > >> >> On Wed, Dec 2, 2015 at 6:52 PM, Maximilian Michels <0.0.0.0:41281 > >> >> >>>> >> >> > 21:30:28,185 ERROR > >> >> >>>> >> >> > org.apache.flink.runtime.jobmanager.JobManager > >> >> >>>> >> >> > - Actor akka://flink/user/jobmanager#403236912 > terminated, > >> >> >>>> >> >> > stopping > >> >> >>>> >> >> > process... > >> >> >>>> >> >> > 21:30:28,286 INFO > >> >> >>>> >> >> > org.apache.flink.runtime.webmonitor.WebRuntimeMonitor > >> >> >>>> >> >> > - Removing web root dir > >> >> >>>> >> >> > /tmp/flink-web-e1a44f94-ea6d-40ee-b87c-e3122d5cb9bd > >> >> >>>> >> >> > > >> >> >>>> >> >> > > >> >> >>>> >> >> > -- > >> >> >>>> >> >> > Best regards / Met vriendelijke groeten, > >> >> >>>> >> >> > > >> >> >>>> >> >> > Niels Basjes > >> >> >>>> >> > > >> >> >>>> >> > > >> >> >>>> >> > > >> >> >>>> >> > > >> >> >>>> >> > -- > >> >> >>>> >> > Best regards / Met vriendelijke groeten, > >> >> >>>> >> > > >> >> >>>> >> > Niels Basjes > >> >> >>>> > > >> >> >>>> > > >> >> >>>> > > >> >> >>>> > > >> >> >>>> > -- > >> >> >>>> > Best regards / Met vriendelijke groeten, > >> >> >>>> > > >> >> >>>> > Niels Basjes > >> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> -- > >> >> >>> Best regards / Met vriendelijke groeten, > >> >> >>> > >> >> >>> Niels Basjes > >> > > >> > > >> > > >> > > >> > -- > >> > Best regards / Met vriendelijke groeten, > >> > > >> > Niels Basjes > > > > > > > > > > -- > > Best regards / Met vriendelijke groeten, > > > > Niels Basjes >