I created https://issues.apache.org/jira/browse/FLINK-2977
On Thu, Nov 5, 2015 at 12:25 PM, Robert Metzger <rmetz...@apache.org> wrote: > Hi Niels, > thank you for analyzing the issue so properly. I agree with you. It seems > that HDFS and HBase are using their own tokes which we need to transfer > from the client to the YARN containers. We should be able to port the fix > from Spark (which they got from Storm) into our YARN client. > I think we would add this in org.apache.flink.yarn.Utils#setTokensFor(). > > Do you want to implement and verify the fix yourself? If you are to busy > at the moment, we can also discuss how we share the work (I'm implementing > it, you test the fix) > > > Robert > > On Tue, Nov 3, 2015 at 5:26 PM, Niels Basjes <ni...@basjes.nl> wrote: > >> Update on the status so far.... I suspect I found a problem in a secure >> setup. >> >> I have created a very simple Flink topology consisting of a streaming >> Source (the outputs the timestamp a few times per second) and a Sink (that >> puts that timestamp into a single record in HBase). >> Running this on a non-secure Yarn cluster works fine. >> >> To run it on a secured Yarn cluster my main routine now looks like this: >> >> public static void main(String[] args) throws Exception { >> System.setProperty("java.security.krb5.conf", "/etc/krb5.conf"); >> UserGroupInformation.loginUserFromKeytab("nbas...@xxxxxx.net", >> "/home/nbasjes/.krb/nbasjes.keytab"); >> >> final StreamExecutionEnvironment env = >> StreamExecutionEnvironment.getExecutionEnvironment(); >> env.setParallelism(1); >> >> DataStream<String> stream = env.addSource(new TimerTicksSource()); >> stream.addSink(new SetHBaseRowSink()); >> env.execute("Long running Flink application"); >> } >> >> When I run this >> flink run -m yarn-cluster -yn 1 -yjm 1024 -ytm 4096 >> ./kerberos-1.0-SNAPSHOT.jar >> >> I see after the startup messages: >> >> 17:13:24,466 INFO org.apache.hadoop.security.UserGroupInformation >> - Login successful for user nbas...@xxxxxx.net using keytab file >> /home/nbasjes/.krb/nbasjes.keytab >> 11/03/2015 17:13:25 Job execution switched to status RUNNING. >> 11/03/2015 17:13:25 Custom Source -> Stream Sink(1/1) switched to >> SCHEDULED >> 11/03/2015 17:13:25 Custom Source -> Stream Sink(1/1) switched to >> DEPLOYING >> 11/03/2015 17:13:25 Custom Source -> Stream Sink(1/1) switched to >> RUNNING >> >> Which looks good. >> >> However ... no data goes into HBase. >> After some digging I found this error in the task managers log: >> >> 17:13:42,677 WARN org.apache.hadoop.hbase.ipc.RpcClient >> - Exception encountered while connecting to the server : >> javax.security.sasl.SaslException: GSS initiate failed [Caused by >> GSSException: No valid credentials provided (Mechanism level: Failed to find >> any Kerberos tgt)] >> 17:13:42,677 FATAL org.apache.hadoop.hbase.ipc.RpcClient >> - SASL authentication failed. The most likely cause is missing or >> invalid credentials. Consider 'kinit'. >> javax.security.sasl.SaslException: GSS initiate failed [Caused by >> GSSException: No valid credentials provided (Mechanism level: Failed to find >> any Kerberos tgt)] >> at >> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212) >> at >> org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:177) >> at >> org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupSaslConnection(RpcClient.java:815) >> at >> org.apache.hadoop.hbase.ipc.RpcClient$Connection.access$800(RpcClient.java:349) >> >> >> First starting a yarn-session and then loading my job gives the same >> error. >> >> My best guess at this point is that Flink needs the same fix as described >> here: >> >> https://issues.apache.org/jira/browse/SPARK-6918 ( >> https://github.com/apache/spark/pull/5586 ) >> >> What do you guys think? >> >> Niels Basjes >> >> >> >> On Tue, Oct 27, 2015 at 6:12 PM, Maximilian Michels <m...@apache.org> >> wrote: >> >>> Hi Niels, >>> >>> You're welcome. Some more information on how this would be configured: >>> >>> In the kdc.conf, there are two variables: >>> >>> max_life = 2h 0m 0s >>> max_renewable_life = 7d 0h 0m 0s >>> >>> max_life is the maximum life of the current ticket. However, it may be >>> renewed up to a time span of max_renewable_life from the first ticket issue >>> on. This means that from the first ticket issue, new tickets may be >>> requested for one week. Each renewed ticket has a life time of max_life (2 >>> hours in this case). >>> >>> Please let us know about any difficulties with long-running streaming >>> application and Kerberos. >>> >>> Best regards, >>> Max >>> >>> On Tue, Oct 27, 2015 at 2:46 PM, Niels Basjes <ni...@basjes.nl> wrote: >>> >>>> Hi, >>>> >>>> Thanks for your feedback. >>>> So I guess I'll have to talk to the security guys about having special >>>> kerberos ticket expiry times for these types of jobs. >>>> >>>> Niels Basjes >>>> >>>> On Fri, Oct 23, 2015 at 11:45 AM, Maximilian Michels <m...@apache.org> >>>> wrote: >>>> >>>>> Hi Niels, >>>>> >>>>> Thank you for your question. Flink relies entirely on the Kerberos >>>>> support of Hadoop. So your question could also be rephrased to "Does >>>>> Hadoop support long-term authentication using Kerberos?". And the >>>>> answer is: Yes! >>>>> >>>>> While Hadoop uses Kerberos tickets to authenticate users with services >>>>> initially, the authentication process continues differently >>>>> afterwards. Instead of saving the ticket to authenticate on a later >>>>> access, Hadoop creates its own security tockens (DelegationToken) that >>>>> it passes around. These are authenticated to Kerberos periodically. To >>>>> my knowledge, the tokens have a life span identical to the Kerberos >>>>> ticket maximum life span. So be sure to set the maximum life span very >>>>> high for long streaming jobs. The renewal time, on the other hand, is >>>>> not important because Hadoop abstracts this away using its own >>>>> security tockens. >>>>> >>>>> I'm afraid there is not Kerberos how-to yet. If you are on Yarn, then >>>>> it is sufficient to authenticate the client with Kerberos. On a Flink >>>>> standalone cluster you need to ensure that, initially, all nodes are >>>>> authenticated with Kerberos using the kinit tool. >>>>> >>>>> Feel free to ask if you have more questions and let us know about any >>>>> difficulties. >>>>> >>>>> Best regards, >>>>> Max >>>>> >>>>> >>>>> >>>>> On Thu, Oct 22, 2015 at 2:06 PM, Niels Basjes <ni...@basjes.nl> wrote: >>>>> > Hi, >>>>> > >>>>> > I want to write a long running (i.e. never stop it) streaming flink >>>>> > application on a kerberos secured Hadoop/Yarn cluster. My >>>>> application needs >>>>> > to do things with files on HDFS and HBase tables on that cluster so >>>>> having >>>>> > the correct kerberos tickets is very important. The stream is to be >>>>> ingested >>>>> > from Kafka. >>>>> > >>>>> > One of the things with Kerberos is that the tickets expire after a >>>>> > predetermined time. My knowledge about kerberos is very limited so I >>>>> hope >>>>> > you guys can help me. >>>>> > >>>>> > My question is actually quite simple: Is there an howto somewhere on >>>>> how to >>>>> > correctly run a long running flink application with kerberos that >>>>> includes a >>>>> > solution for the kerberos ticket timeout ? >>>>> > >>>>> > Thanks >>>>> > >>>>> > Niels Basjes >>>>> >>>> >>>> >>>> >>>> -- >>>> Best regards / Met vriendelijke groeten, >>>> >>>> Niels Basjes >>>> >>> >>> >> >> >> -- >> Best regards / Met vriendelijke groeten, >> >> Niels Basjes >> > > -- Best regards / Met vriendelijke groeten, Niels Basjes