Hi,

In my environment doing the "proxy" thing didn't work.
With an token expire of 168 hours (1 week) the job consistently terminates
at exactly (within a margin of 10 seconds) 173.5 hours.
So far we have not been able to solve this problem.

Our teams now simply assume the thing fails once in a while and have an
automatic restart feature (i.e. shell script with a while true loop).
The best guess at a root cause is this
https://issues.apache.org/jira/browse/HDFS-9276

If you have a real solution or a reference to a related bug report to this
problem then please share!

Niels Basjes



On Thu, Mar 17, 2016 at 10:20 AM, Thomas Lamirault <
thomas.lamira...@ericsson.com> wrote:

> Hi Max,
>
> I will try these workaround.
> Thanks
>
> Thomas
>
> ________________________________________
> De : Maximilian Michels [m...@apache.org]
> Envoyé : mardi 15 mars 2016 16:51
> À : user@flink.apache.org
> Cc : Niels Basjes
> Objet : Re: Flink job on secure Yarn fails after many hours
>
> Hi Thomas,
>
> Nils (CC) and I found out that you need at least Hadoop version 2.6.1
> to properly run Kerberos applications on Hadoop clusters. Versions
> before that have critical bugs related to the internal security token
> handling that may expire the token although it is still valid.
>
> That said, there is another limitation of Hadoop that the maximum
> internal token life time is one week. To work around this limit, you
> have two options:
>
> a) increasing the maximum token life time
>
> In yarn-site.xml:
>
> <property>
>   <name>yarn.resourcemanager.delegation.token.max-lifetime</name>
>   <value>9223372036854775807</value>
> </property>
>
> In hdfs-site.xml
>
> <property>
>   <name>dfs.namenode.delegation.token.max-lifetime</name>
>   <value>9223372036854775807</value>
> </property>
>
>
> b) setup the Yarn ResourceManager as a proxy for the HDFS Namenode:
>
> From
> http://www.cloudera.com/documentation/enterprise/5-3-x/topics/cm_sg_yarn_long_jobs.html
>
> "You can work around this by configuring the ResourceManager as a
> proxy user for the corresponding HDFS NameNode so that the
> ResourceManager can request new tokens when the existing ones are past
> their maximum lifetime."
>
> @Nils: Could you comment on what worked best for you?
>
> Best,
> Max
>
>
> On Mon, Mar 14, 2016 at 12:24 PM, Thomas Lamirault
> <thomas.lamira...@ericsson.com> wrote:
> >
> > Hello everyone,
> >
> >
> >
> > We are facing the same probleme now in our Flink applications, launch
> using YARN.
> >
> > Just want to know if there is any update about this exception ?
> >
> >
> >
> > Thanks
> >
> >
> >
> > Thomas
> >
> >
> >
> > ________________________________
> >
> > De : ni...@basj.es [ni...@basj.es] de la part de Niels Basjes [
> ni...@basjes.nl]
> > Envoyé : vendredi 4 décembre 2015 10:40
> > À : user@flink.apache.org
> > Objet : Re: Flink job on secure Yarn fails after many hours
> >
> > Hi Maximilian,
> >
> > I just downloaded the version from your google drive and used that to
> run my test topology that accesses HBase.
> > I deliberately started it twice to double the chance to run into this
> situation.
> >
> > I'll keep you posted.
> >
> > Niels
> >
> >
> > On Thu, Dec 3, 2015 at 11:44 AM, Maximilian Michels <m...@apache.org>
> wrote:
> >>
> >> Hi Niels,
> >>
> >> Just got back from our CI. The build above would fail with a
> >> Checkstyle error. I corrected that. Also I have built the binaries for
> >> your Hadoop version 2.6.0.
> >>
> >> Binaries:
> >>
> >>
> https://github.com/mxm/flink/archive/kerberos-yarn-heartbeat-fail-0.10.1.zip
> >>
> >> Thanks,
> >> Max
> >>
> >> On Wed, Dec 2, 2015 at 6:52 PM, Maximilian Michels <0.0.0.0:41281
> >> >>>> >> >> > 21:30:28,185 ERROR
> org.apache.flink.runtime.jobmanager.JobManager
> >> >>>> >> >> > - Actor akka://flink/user/jobmanager#403236912 terminated,
> >> >>>> >> >> > stopping
> >> >>>> >> >> > process...
> >> >>>> >> >> > 21:30:28,286 INFO
> >> >>>> >> >> > org.apache.flink.runtime.webmonitor.WebRuntimeMonitor
> >> >>>> >> >> > - Removing web root dir
> >> >>>> >> >> > /tmp/flink-web-e1a44f94-ea6d-40ee-b87c-e3122d5cb9bd
> >> >>>> >> >> >
> >> >>>> >> >> >
> >> >>>> >> >> > --
> >> >>>> >> >> > Best regards / Met vriendelijke groeten,
> >> >>>> >> >> >
> >> >>>> >> >> > Niels Basjes
> >> >>>> >> >
> >> >>>> >> >
> >> >>>> >> >
> >> >>>> >> >
> >> >>>> >> > --
> >> >>>> >> > Best regards / Met vriendelijke groeten,
> >> >>>> >> >
> >> >>>> >> > Niels Basjes
> >> >>>> >
> >> >>>> >
> >> >>>> >
> >> >>>> >
> >> >>>> > --
> >> >>>> > Best regards / Met vriendelijke groeten,
> >> >>>> >
> >> >>>> > Niels Basjes
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Best regards / Met vriendelijke groeten,
> >> >>>
> >> >>> Niels Basjes
> >
> >
> >
> >
> > --
> > Best regards / Met vriendelijke groeten,
> >
> > Niels Basjes
>



-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Reply via email to