Great to hear you sorted things out. Looking forward to the pull request!
On Mon, Nov 9, 2015 at 4:50 PM, Stephan Ewen <se...@apache.org> wrote: > Super nice to hear :-) > > > On Mon, Nov 9, 2015 at 4:48 PM, Niels Basjes <ni...@basjes.nl> wrote: >> >> Apparently I just had to wait a bit longer for the first run. >> Now I'm able to package the project in about 7 minutes. >> >> Current status: I am now able to access HBase from within Flink on a >> Kerberos secured cluster. >> Cleaning up the patch so I can submit it in a few days. >> >> On Sat, Nov 7, 2015 at 10:01 PM, Stephan Ewen <se...@apache.org> wrote: >>> >>> The single shading step on my machine (SSD, 10 GB RAM) takes about 45 >>> seconds. HDD may be significantly longer, but should really not be more than >>> 10 minutes. >>> >>> Is your maven build always stuck in that stage (flink-dist) showing a >>> long list of dependencies (saying including org.x.y, including com.foo.bar, >>> ...) ? >>> >>> >>> On Sat, Nov 7, 2015 at 9:57 PM, Sachin Goel <sachingoel0...@gmail.com> >>> wrote: >>>> >>>> Usually, if all the dependencies are being downloaded, i.e., on the >>>> first build, it'll likely take 30-40 minutes. Subsequent builds might take >>>> 10 minutes approx. [I have the same PC configuration.] >>>> >>>> -- Sachin Goel >>>> Computer Science, IIT Delhi >>>> m. +91-9871457685 >>>> >>>> On Sun, Nov 8, 2015 at 2:05 AM, Niels Basjes <ni...@basjes.nl> wrote: >>>>> >>>>> How long should this take if you have HDD and about 8GB of RAM? >>>>> Is that 10 minutes? 20? >>>>> >>>>> Niels >>>>> >>>>> On Sat, Nov 7, 2015 at 2:51 PM, Stephan Ewen <se...@apache.org> wrote: >>>>>> >>>>>> Hi Niels! >>>>>> >>>>>> Usually, you simply build the binaries by invoking "mvn -DskipTests >>>>>> clean package" in the root flink directory. The resulting program should >>>>>> be >>>>>> in the "build-target" directory. >>>>>> >>>>>> If the program gets stuck, let us know where and what the last message >>>>>> on the command line is. >>>>>> >>>>>> Please be aware that the final step of building the "flink-dist" >>>>>> project may take a while, especially on systems with hard disks (as >>>>>> opposed >>>>>> to SSDs) and a comparatively low amount of memory. The reason is that the >>>>>> building of the final JAR file is quite expensive, because the system >>>>>> re-packages certain libraries in order to avoid conflicts between >>>>>> different >>>>>> versions. >>>>>> >>>>>> Stephan >>>>>> >>>>>> >>>>>> On Sat, Nov 7, 2015 at 2:40 PM, Niels Basjes <ni...@basj.es> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Excellent. >>>>>>> What you can help me with are the commands to build the binary >>>>>>> distribution from source. >>>>>>> I tried it last Thursday and the build seemed to get stuck at some >>>>>>> point (at the end of/just after building the dist module). >>>>>>> I haven't been able to figure out why yet. >>>>>>> >>>>>>> Niels >>>>>>> >>>>>>> On 5 Nov 2015 14:57, "Maximilian Michels" <m...@apache.org> wrote: >>>>>>>> >>>>>>>> Thank you for looking into the problem, Niels. Let us know if you >>>>>>>> need anything. We would be happy to merge a pull request once you have >>>>>>>> verified the fix. >>>>>>>> >>>>>>>> On Thu, Nov 5, 2015 at 1:38 PM, Niels Basjes <ni...@basjes.nl> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> I created https://issues.apache.org/jira/browse/FLINK-2977 >>>>>>>>> >>>>>>>>> On Thu, Nov 5, 2015 at 12:25 PM, Robert Metzger >>>>>>>>> <rmetz...@apache.org> wrote: >>>>>>>>>> >>>>>>>>>> Hi Niels, >>>>>>>>>> thank you for analyzing the issue so properly. I agree with you. >>>>>>>>>> It seems that HDFS and HBase are using their own tokes which we need >>>>>>>>>> to >>>>>>>>>> transfer from the client to the YARN containers. We should be able >>>>>>>>>> to port >>>>>>>>>> the fix from Spark (which they got from Storm) into our YARN client. >>>>>>>>>> I think we would add this in >>>>>>>>>> org.apache.flink.yarn.Utils#setTokensFor(). >>>>>>>>>> >>>>>>>>>> Do you want to implement and verify the fix yourself? If you are >>>>>>>>>> to busy at the moment, we can also discuss how we share the work (I'm >>>>>>>>>> implementing it, you test the fix) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Robert >>>>>>>>>> >>>>>>>>>> On Tue, Nov 3, 2015 at 5:26 PM, Niels Basjes <ni...@basjes.nl> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Update on the status so far.... I suspect I found a problem in a >>>>>>>>>>> secure setup. >>>>>>>>>>> >>>>>>>>>>> I have created a very simple Flink topology consisting of a >>>>>>>>>>> streaming Source (the outputs the timestamp a few times per second) >>>>>>>>>>> and a >>>>>>>>>>> Sink (that puts that timestamp into a single record in HBase). >>>>>>>>>>> Running this on a non-secure Yarn cluster works fine. >>>>>>>>>>> >>>>>>>>>>> To run it on a secured Yarn cluster my main routine now looks >>>>>>>>>>> like this: >>>>>>>>>>> >>>>>>>>>>> public static void main(String[] args) throws Exception { >>>>>>>>>>> System.setProperty("java.security.krb5.conf", >>>>>>>>>>> "/etc/krb5.conf"); >>>>>>>>>>> >>>>>>>>>>> UserGroupInformation.loginUserFromKeytab("nbas...@xxxxxx.net", >>>>>>>>>>> "/home/nbasjes/.krb/nbasjes.keytab"); >>>>>>>>>>> >>>>>>>>>>> final StreamExecutionEnvironment env = >>>>>>>>>>> StreamExecutionEnvironment.getExecutionEnvironment(); >>>>>>>>>>> env.setParallelism(1); >>>>>>>>>>> >>>>>>>>>>> DataStream<String> stream = env.addSource(new >>>>>>>>>>> TimerTicksSource()); >>>>>>>>>>> stream.addSink(new SetHBaseRowSink()); >>>>>>>>>>> env.execute("Long running Flink application"); >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> When I run this >>>>>>>>>>> flink run -m yarn-cluster -yn 1 -yjm 1024 -ytm 4096 >>>>>>>>>>> ./kerberos-1.0-SNAPSHOT.jar >>>>>>>>>>> >>>>>>>>>>> I see after the startup messages: >>>>>>>>>>> >>>>>>>>>>> 17:13:24,466 INFO >>>>>>>>>>> org.apache.hadoop.security.UserGroupInformation - >>>>>>>>>>> Login >>>>>>>>>>> successful for user nbas...@xxxxxx.net using keytab file >>>>>>>>>>> /home/nbasjes/.krb/nbasjes.keytab >>>>>>>>>>> 11/03/2015 17:13:25 Job execution switched to status RUNNING. >>>>>>>>>>> 11/03/2015 17:13:25 Custom Source -> Stream Sink(1/1) switched to >>>>>>>>>>> SCHEDULED >>>>>>>>>>> 11/03/2015 17:13:25 Custom Source -> Stream Sink(1/1) switched to >>>>>>>>>>> DEPLOYING >>>>>>>>>>> 11/03/2015 17:13:25 Custom Source -> Stream Sink(1/1) switched to >>>>>>>>>>> RUNNING >>>>>>>>>>> >>>>>>>>>>> Which looks good. >>>>>>>>>>> >>>>>>>>>>> However ... no data goes into HBase. >>>>>>>>>>> After some digging I found this error in the task managers log: >>>>>>>>>>> >>>>>>>>>>> 17:13:42,677 WARN org.apache.hadoop.hbase.ipc.RpcClient >>>>>>>>>>> - Exception encountered while connecting to the server : >>>>>>>>>>> javax.security.sasl.SaslException: GSS initiate failed [Caused by >>>>>>>>>>> GSSException: No valid credentials provided (Mechanism level: >>>>>>>>>>> Failed to find >>>>>>>>>>> any Kerberos tgt)] >>>>>>>>>>> 17:13:42,677 FATAL org.apache.hadoop.hbase.ipc.RpcClient >>>>>>>>>>> - SASL authentication failed. The most likely cause is missing or >>>>>>>>>>> invalid >>>>>>>>>>> credentials. Consider 'kinit'. >>>>>>>>>>> javax.security.sasl.SaslException: GSS initiate failed [Caused by >>>>>>>>>>> GSSException: No valid credentials provided (Mechanism level: >>>>>>>>>>> Failed to find >>>>>>>>>>> any Kerberos tgt)] >>>>>>>>>>> at >>>>>>>>>>> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212) >>>>>>>>>>> at >>>>>>>>>>> org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:177) >>>>>>>>>>> at >>>>>>>>>>> org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupSaslConnection(RpcClient.java:815) >>>>>>>>>>> at >>>>>>>>>>> org.apache.hadoop.hbase.ipc.RpcClient$Connection.access$800(RpcClient.java:349) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> First starting a yarn-session and then loading my job gives the >>>>>>>>>>> same error. >>>>>>>>>>> >>>>>>>>>>> My best guess at this point is that Flink needs the same fix as >>>>>>>>>>> described here: >>>>>>>>>>> >>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-6918 ( >>>>>>>>>>> https://github.com/apache/spark/pull/5586 ) >>>>>>>>>>> >>>>>>>>>>> What do you guys think? >>>>>>>>>>> >>>>>>>>>>> Niels Basjes >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Oct 27, 2015 at 6:12 PM, Maximilian Michels >>>>>>>>>>> <m...@apache.org> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi Niels, >>>>>>>>>>>> >>>>>>>>>>>> You're welcome. Some more information on how this would be >>>>>>>>>>>> configured: >>>>>>>>>>>> >>>>>>>>>>>> In the kdc.conf, there are two variables: >>>>>>>>>>>> >>>>>>>>>>>> max_life = 2h 0m 0s >>>>>>>>>>>> max_renewable_life = 7d 0h 0m 0s >>>>>>>>>>>> >>>>>>>>>>>> max_life is the maximum life of the current ticket. However, it >>>>>>>>>>>> may be renewed up to a time span of max_renewable_life from the >>>>>>>>>>>> first ticket >>>>>>>>>>>> issue on. This means that from the first ticket issue, new tickets >>>>>>>>>>>> may be >>>>>>>>>>>> requested for one week. Each renewed ticket has a life time of >>>>>>>>>>>> max_life (2 >>>>>>>>>>>> hours in this case). >>>>>>>>>>>> >>>>>>>>>>>> Please let us know about any difficulties with long-running >>>>>>>>>>>> streaming application and Kerberos. >>>>>>>>>>>> >>>>>>>>>>>> Best regards, >>>>>>>>>>>> Max >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Oct 27, 2015 at 2:46 PM, Niels Basjes <ni...@basjes.nl> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for your feedback. >>>>>>>>>>>>> So I guess I'll have to talk to the security guys about having >>>>>>>>>>>>> special >>>>>>>>>>>>> kerberos ticket expiry times for these types of jobs. >>>>>>>>>>>>> >>>>>>>>>>>>> Niels Basjes >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Oct 23, 2015 at 11:45 AM, Maximilian Michels >>>>>>>>>>>>> <m...@apache.org> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Niels, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thank you for your question. Flink relies entirely on the >>>>>>>>>>>>>> Kerberos >>>>>>>>>>>>>> support of Hadoop. So your question could also be rephrased to >>>>>>>>>>>>>> "Does >>>>>>>>>>>>>> Hadoop support long-term authentication using Kerberos?". And >>>>>>>>>>>>>> the >>>>>>>>>>>>>> answer is: Yes! >>>>>>>>>>>>>> >>>>>>>>>>>>>> While Hadoop uses Kerberos tickets to authenticate users with >>>>>>>>>>>>>> services >>>>>>>>>>>>>> initially, the authentication process continues differently >>>>>>>>>>>>>> afterwards. Instead of saving the ticket to authenticate on a >>>>>>>>>>>>>> later >>>>>>>>>>>>>> access, Hadoop creates its own security tockens >>>>>>>>>>>>>> (DelegationToken) that >>>>>>>>>>>>>> it passes around. These are authenticated to Kerberos >>>>>>>>>>>>>> periodically. To >>>>>>>>>>>>>> my knowledge, the tokens have a life span identical to the >>>>>>>>>>>>>> Kerberos >>>>>>>>>>>>>> ticket maximum life span. So be sure to set the maximum life >>>>>>>>>>>>>> span very >>>>>>>>>>>>>> high for long streaming jobs. The renewal time, on the other >>>>>>>>>>>>>> hand, is >>>>>>>>>>>>>> not important because Hadoop abstracts this away using its own >>>>>>>>>>>>>> security tockens. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm afraid there is not Kerberos how-to yet. If you are on >>>>>>>>>>>>>> Yarn, then >>>>>>>>>>>>>> it is sufficient to authenticate the client with Kerberos. On >>>>>>>>>>>>>> a Flink >>>>>>>>>>>>>> standalone cluster you need to ensure that, initially, all >>>>>>>>>>>>>> nodes are >>>>>>>>>>>>>> authenticated with Kerberos using the kinit tool. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Feel free to ask if you have more questions and let us know >>>>>>>>>>>>>> about any >>>>>>>>>>>>>> difficulties. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>> Max >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Oct 22, 2015 at 2:06 PM, Niels Basjes >>>>>>>>>>>>>> <ni...@basjes.nl> wrote: >>>>>>>>>>>>>> > Hi, >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > I want to write a long running (i.e. never stop it) >>>>>>>>>>>>>> > streaming flink >>>>>>>>>>>>>> > application on a kerberos secured Hadoop/Yarn cluster. My >>>>>>>>>>>>>> > application needs >>>>>>>>>>>>>> > to do things with files on HDFS and HBase tables on that >>>>>>>>>>>>>> > cluster so having >>>>>>>>>>>>>> > the correct kerberos tickets is very important. The stream >>>>>>>>>>>>>> > is to be ingested >>>>>>>>>>>>>> > from Kafka. >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > One of the things with Kerberos is that the tickets expire >>>>>>>>>>>>>> > after a >>>>>>>>>>>>>> > predetermined time. My knowledge about kerberos is very >>>>>>>>>>>>>> > limited so I hope >>>>>>>>>>>>>> > you guys can help me. >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > My question is actually quite simple: Is there an howto >>>>>>>>>>>>>> > somewhere on how to >>>>>>>>>>>>>> > correctly run a long running flink application with kerberos >>>>>>>>>>>>>> > that includes a >>>>>>>>>>>>>> > solution for the kerberos ticket timeout ? >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > Thanks >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > Niels Basjes >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Best regards / Met vriendelijke groeten, >>>>>>>>>>>>> >>>>>>>>>>>>> Niels Basjes >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Best regards / Met vriendelijke groeten, >>>>>>>>>>> >>>>>>>>>>> Niels Basjes >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Best regards / Met vriendelijke groeten, >>>>>>>>> >>>>>>>>> Niels Basjes >>>>>>>> >>>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Best regards / Met vriendelijke groeten, >>>>> >>>>> Niels Basjes >>>> >>>> >>> >> >> >> >> -- >> Best regards / Met vriendelijke groeten, >> >> Niels Basjes > >