Super nice to hear :-)
On Mon, Nov 9, 2015 at 4:48 PM, Niels Basjes <[email protected]> wrote: > Apparently I just had to wait a bit longer for the first run. > Now I'm able to package the project in about 7 minutes. > > Current status: I am now able to access HBase from within Flink on a > Kerberos secured cluster. > Cleaning up the patch so I can submit it in a few days. > > On Sat, Nov 7, 2015 at 10:01 PM, Stephan Ewen <[email protected]> wrote: > >> The single shading step on my machine (SSD, 10 GB RAM) takes about 45 >> seconds. HDD may be significantly longer, but should really not be more >> than 10 minutes. >> >> Is your maven build always stuck in that stage (flink-dist) showing a >> long list of dependencies (saying including org.x.y, including com.foo.bar, >> ...) ? >> >> >> On Sat, Nov 7, 2015 at 9:57 PM, Sachin Goel <[email protected]> >> wrote: >> >>> Usually, if all the dependencies are being downloaded, i.e., on the >>> first build, it'll likely take 30-40 minutes. Subsequent builds might take >>> 10 minutes approx. [I have the same PC configuration.] >>> >>> -- Sachin Goel >>> Computer Science, IIT Delhi >>> m. +91-9871457685 >>> >>> On Sun, Nov 8, 2015 at 2:05 AM, Niels Basjes <[email protected]> wrote: >>> >>>> How long should this take if you have HDD and about 8GB of RAM? >>>> Is that 10 minutes? 20? >>>> >>>> Niels >>>> >>>> On Sat, Nov 7, 2015 at 2:51 PM, Stephan Ewen <[email protected]> wrote: >>>> >>>>> Hi Niels! >>>>> >>>>> Usually, you simply build the binaries by invoking "mvn -DskipTests >>>>> clean package" in the root flink directory. The resulting program should >>>>> be >>>>> in the "build-target" directory. >>>>> >>>>> If the program gets stuck, let us know where and what the last message >>>>> on the command line is. >>>>> >>>>> Please be aware that the final step of building the "flink-dist" >>>>> project may take a while, especially on systems with hard disks (as >>>>> opposed >>>>> to SSDs) and a comparatively low amount of memory. The reason is that the >>>>> building of the final JAR file is quite expensive, because the system >>>>> re-packages certain libraries in order to avoid conflicts between >>>>> different >>>>> versions. >>>>> >>>>> Stephan >>>>> >>>>> >>>>> On Sat, Nov 7, 2015 at 2:40 PM, Niels Basjes <[email protected]> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Excellent. >>>>>> What you can help me with are the commands to build the binary >>>>>> distribution from source. >>>>>> I tried it last Thursday and the build seemed to get stuck at some >>>>>> point (at the end of/just after building the dist module). >>>>>> I haven't been able to figure out why yet. >>>>>> >>>>>> Niels >>>>>> On 5 Nov 2015 14:57, "Maximilian Michels" <[email protected]> wrote: >>>>>> >>>>>>> Thank you for looking into the problem, Niels. Let us know if you >>>>>>> need anything. We would be happy to merge a pull request once you have >>>>>>> verified the fix. >>>>>>> >>>>>>> On Thu, Nov 5, 2015 at 1:38 PM, Niels Basjes <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> I created https://issues.apache.org/jira/browse/FLINK-2977 >>>>>>>> >>>>>>>> On Thu, Nov 5, 2015 at 12:25 PM, Robert Metzger < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hi Niels, >>>>>>>>> thank you for analyzing the issue so properly. I agree with you. >>>>>>>>> It seems that HDFS and HBase are using their own tokes which we need >>>>>>>>> to >>>>>>>>> transfer from the client to the YARN containers. We should be able to >>>>>>>>> port >>>>>>>>> the fix from Spark (which they got from Storm) into our YARN client. >>>>>>>>> I think we would add this in org.apache.flink.yarn.Utils# >>>>>>>>> setTokensFor(). >>>>>>>>> >>>>>>>>> Do you want to implement and verify the fix yourself? If you are >>>>>>>>> to busy at the moment, we can also discuss how we share the work (I'm >>>>>>>>> implementing it, you test the fix) >>>>>>>>> >>>>>>>>> >>>>>>>>> Robert >>>>>>>>> >>>>>>>>> On Tue, Nov 3, 2015 at 5:26 PM, Niels Basjes <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Update on the status so far.... I suspect I found a problem in a >>>>>>>>>> secure setup. >>>>>>>>>> >>>>>>>>>> I have created a very simple Flink topology consisting of a >>>>>>>>>> streaming Source (the outputs the timestamp a few times per second) >>>>>>>>>> and a >>>>>>>>>> Sink (that puts that timestamp into a single record in HBase). >>>>>>>>>> Running this on a non-secure Yarn cluster works fine. >>>>>>>>>> >>>>>>>>>> To run it on a secured Yarn cluster my main routine now looks >>>>>>>>>> like this: >>>>>>>>>> >>>>>>>>>> public static void main(String[] args) throws Exception { >>>>>>>>>> System.setProperty("java.security.krb5.conf", "/etc/krb5.conf"); >>>>>>>>>> UserGroupInformation.loginUserFromKeytab("[email protected]", >>>>>>>>>> "/home/nbasjes/.krb/nbasjes.keytab"); >>>>>>>>>> >>>>>>>>>> final StreamExecutionEnvironment env = >>>>>>>>>> StreamExecutionEnvironment.getExecutionEnvironment(); >>>>>>>>>> env.setParallelism(1); >>>>>>>>>> >>>>>>>>>> DataStream<String> stream = env.addSource(new >>>>>>>>>> TimerTicksSource()); >>>>>>>>>> stream.addSink(new SetHBaseRowSink()); >>>>>>>>>> env.execute("Long running Flink application"); >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> When I run this >>>>>>>>>> flink run -m yarn-cluster -yn 1 -yjm 1024 -ytm 4096 >>>>>>>>>> ./kerberos-1.0-SNAPSHOT.jar >>>>>>>>>> >>>>>>>>>> I see after the startup messages: >>>>>>>>>> >>>>>>>>>> 17:13:24,466 INFO >>>>>>>>>> org.apache.hadoop.security.UserGroupInformation - >>>>>>>>>> Login >>>>>>>>>> successful for user [email protected] using keytab file >>>>>>>>>> /home/nbasjes/.krb/nbasjes.keytab >>>>>>>>>> 11/03/2015 17:13:25 Job execution switched to status RUNNING. >>>>>>>>>> 11/03/2015 17:13:25 Custom Source -> Stream Sink(1/1) switched >>>>>>>>>> to SCHEDULED >>>>>>>>>> 11/03/2015 17:13:25 Custom Source -> Stream Sink(1/1) switched >>>>>>>>>> to DEPLOYING >>>>>>>>>> 11/03/2015 17:13:25 Custom Source -> Stream Sink(1/1) switched >>>>>>>>>> to RUNNING >>>>>>>>>> >>>>>>>>>> Which looks good. >>>>>>>>>> >>>>>>>>>> However ... no data goes into HBase. >>>>>>>>>> After some digging I found this error in the task managers log: >>>>>>>>>> >>>>>>>>>> 17:13:42,677 WARN org.apache.hadoop.hbase.ipc.RpcClient >>>>>>>>>> - Exception encountered while connecting to the server : >>>>>>>>>> javax.security.sasl.SaslException: GSS initiate failed [Caused by >>>>>>>>>> GSSException: No valid credentials provided (Mechanism level: Failed >>>>>>>>>> to find any Kerberos tgt)] >>>>>>>>>> 17:13:42,677 FATAL org.apache.hadoop.hbase.ipc.RpcClient >>>>>>>>>> - SASL authentication failed. The most likely cause is >>>>>>>>>> missing or invalid credentials. Consider 'kinit'. >>>>>>>>>> javax.security.sasl.SaslException: GSS initiate failed [Caused by >>>>>>>>>> GSSException: No valid credentials provided (Mechanism level: Failed >>>>>>>>>> to find any Kerberos tgt)] >>>>>>>>>> at >>>>>>>>>> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:177) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupSaslConnection(RpcClient.java:815) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hbase.ipc.RpcClient$Connection.access$800(RpcClient.java:349) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> First starting a yarn-session and then loading my job gives the >>>>>>>>>> same error. >>>>>>>>>> >>>>>>>>>> My best guess at this point is that Flink needs the same fix as >>>>>>>>>> described here: >>>>>>>>>> >>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-6918 ( >>>>>>>>>> https://github.com/apache/spark/pull/5586 ) >>>>>>>>>> >>>>>>>>>> What do you guys think? >>>>>>>>>> >>>>>>>>>> Niels Basjes >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Oct 27, 2015 at 6:12 PM, Maximilian Michels < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Niels, >>>>>>>>>>> >>>>>>>>>>> You're welcome. Some more information on how this would be >>>>>>>>>>> configured: >>>>>>>>>>> >>>>>>>>>>> In the kdc.conf, there are two variables: >>>>>>>>>>> >>>>>>>>>>> max_life = 2h 0m 0s >>>>>>>>>>> max_renewable_life = 7d 0h 0m 0s >>>>>>>>>>> >>>>>>>>>>> max_life is the maximum life of the current ticket. However, it >>>>>>>>>>> may be renewed up to a time span of max_renewable_life from the >>>>>>>>>>> first >>>>>>>>>>> ticket issue on. This means that from the first ticket issue, new >>>>>>>>>>> tickets >>>>>>>>>>> may be requested for one week. Each renewed ticket has a life time >>>>>>>>>>> of >>>>>>>>>>> max_life (2 hours in this case). >>>>>>>>>>> >>>>>>>>>>> Please let us know about any difficulties with long-running >>>>>>>>>>> streaming application and Kerberos. >>>>>>>>>>> >>>>>>>>>>> Best regards, >>>>>>>>>>> Max >>>>>>>>>>> >>>>>>>>>>> On Tue, Oct 27, 2015 at 2:46 PM, Niels Basjes <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> Thanks for your feedback. >>>>>>>>>>>> So I guess I'll have to talk to the security guys about having >>>>>>>>>>>> special >>>>>>>>>>>> kerberos ticket expiry times for these types of jobs. >>>>>>>>>>>> >>>>>>>>>>>> Niels Basjes >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Oct 23, 2015 at 11:45 AM, Maximilian Michels < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Niels, >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you for your question. Flink relies entirely on the >>>>>>>>>>>>> Kerberos >>>>>>>>>>>>> support of Hadoop. So your question could also be rephrased to >>>>>>>>>>>>> "Does >>>>>>>>>>>>> Hadoop support long-term authentication using Kerberos?". And >>>>>>>>>>>>> the >>>>>>>>>>>>> answer is: Yes! >>>>>>>>>>>>> >>>>>>>>>>>>> While Hadoop uses Kerberos tickets to authenticate users with >>>>>>>>>>>>> services >>>>>>>>>>>>> initially, the authentication process continues differently >>>>>>>>>>>>> afterwards. Instead of saving the ticket to authenticate on a >>>>>>>>>>>>> later >>>>>>>>>>>>> access, Hadoop creates its own security tockens >>>>>>>>>>>>> (DelegationToken) that >>>>>>>>>>>>> it passes around. These are authenticated to Kerberos >>>>>>>>>>>>> periodically. To >>>>>>>>>>>>> my knowledge, the tokens have a life span identical to the >>>>>>>>>>>>> Kerberos >>>>>>>>>>>>> ticket maximum life span. So be sure to set the maximum life >>>>>>>>>>>>> span very >>>>>>>>>>>>> high for long streaming jobs. The renewal time, on the other >>>>>>>>>>>>> hand, is >>>>>>>>>>>>> not important because Hadoop abstracts this away using its own >>>>>>>>>>>>> security tockens. >>>>>>>>>>>>> >>>>>>>>>>>>> I'm afraid there is not Kerberos how-to yet. If you are on >>>>>>>>>>>>> Yarn, then >>>>>>>>>>>>> it is sufficient to authenticate the client with Kerberos. On >>>>>>>>>>>>> a Flink >>>>>>>>>>>>> standalone cluster you need to ensure that, initially, all >>>>>>>>>>>>> nodes are >>>>>>>>>>>>> authenticated with Kerberos using the kinit tool. >>>>>>>>>>>>> >>>>>>>>>>>>> Feel free to ask if you have more questions and let us know >>>>>>>>>>>>> about any >>>>>>>>>>>>> difficulties. >>>>>>>>>>>>> >>>>>>>>>>>>> Best regards, >>>>>>>>>>>>> Max >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Oct 22, 2015 at 2:06 PM, Niels Basjes <[email protected]> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> > Hi, >>>>>>>>>>>>> > >>>>>>>>>>>>> > I want to write a long running (i.e. never stop it) >>>>>>>>>>>>> streaming flink >>>>>>>>>>>>> > application on a kerberos secured Hadoop/Yarn cluster. My >>>>>>>>>>>>> application needs >>>>>>>>>>>>> > to do things with files on HDFS and HBase tables on that >>>>>>>>>>>>> cluster so having >>>>>>>>>>>>> > the correct kerberos tickets is very important. The stream >>>>>>>>>>>>> is to be ingested >>>>>>>>>>>>> > from Kafka. >>>>>>>>>>>>> > >>>>>>>>>>>>> > One of the things with Kerberos is that the tickets expire >>>>>>>>>>>>> after a >>>>>>>>>>>>> > predetermined time. My knowledge about kerberos is very >>>>>>>>>>>>> limited so I hope >>>>>>>>>>>>> > you guys can help me. >>>>>>>>>>>>> > >>>>>>>>>>>>> > My question is actually quite simple: Is there an howto >>>>>>>>>>>>> somewhere on how to >>>>>>>>>>>>> > correctly run a long running flink application with kerberos >>>>>>>>>>>>> that includes a >>>>>>>>>>>>> > solution for the kerberos ticket timeout ? >>>>>>>>>>>>> > >>>>>>>>>>>>> > Thanks >>>>>>>>>>>>> > >>>>>>>>>>>>> > Niels Basjes >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Best regards / Met vriendelijke groeten, >>>>>>>>>>>> >>>>>>>>>>>> Niels Basjes >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Best regards / Met vriendelijke groeten, >>>>>>>>>> >>>>>>>>>> Niels Basjes >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Best regards / Met vriendelijke groeten, >>>>>>>> >>>>>>>> Niels Basjes >>>>>>>> >>>>>>> >>>>>>> >>>>> >>>> >>>> >>>> -- >>>> Best regards / Met vriendelijke groeten, >>>> >>>> Niels Basjes >>>> >>> >>> >> > > > -- > Best regards / Met vriendelijke groeten, > > Niels Basjes >
