Usually, if all the dependencies are being downloaded, i.e., on the first build, it'll likely take 30-40 minutes. Subsequent builds might take 10 minutes approx. [I have the same PC configuration.]
-- Sachin Goel Computer Science, IIT Delhi m. +91-9871457685 On Sun, Nov 8, 2015 at 2:05 AM, Niels Basjes <[email protected]> wrote: > How long should this take if you have HDD and about 8GB of RAM? > Is that 10 minutes? 20? > > Niels > > On Sat, Nov 7, 2015 at 2:51 PM, Stephan Ewen <[email protected]> wrote: > >> Hi Niels! >> >> Usually, you simply build the binaries by invoking "mvn -DskipTests clean >> package" in the root flink directory. The resulting program should be in >> the "build-target" directory. >> >> If the program gets stuck, let us know where and what the last message on >> the command line is. >> >> Please be aware that the final step of building the "flink-dist" project >> may take a while, especially on systems with hard disks (as opposed to >> SSDs) and a comparatively low amount of memory. The reason is that the >> building of the final JAR file is quite expensive, because the system >> re-packages certain libraries in order to avoid conflicts between different >> versions. >> >> Stephan >> >> >> On Sat, Nov 7, 2015 at 2:40 PM, Niels Basjes <[email protected]> wrote: >> >>> Hi, >>> >>> Excellent. >>> What you can help me with are the commands to build the binary >>> distribution from source. >>> I tried it last Thursday and the build seemed to get stuck at some point >>> (at the end of/just after building the dist module). >>> I haven't been able to figure out why yet. >>> >>> Niels >>> On 5 Nov 2015 14:57, "Maximilian Michels" <[email protected]> wrote: >>> >>>> Thank you for looking into the problem, Niels. Let us know if you need >>>> anything. We would be happy to merge a pull request once you have verified >>>> the fix. >>>> >>>> On Thu, Nov 5, 2015 at 1:38 PM, Niels Basjes <[email protected]> wrote: >>>> >>>>> I created https://issues.apache.org/jira/browse/FLINK-2977 >>>>> >>>>> On Thu, Nov 5, 2015 at 12:25 PM, Robert Metzger <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Niels, >>>>>> thank you for analyzing the issue so properly. I agree with you. It >>>>>> seems that HDFS and HBase are using their own tokes which we need to >>>>>> transfer from the client to the YARN containers. We should be able to >>>>>> port >>>>>> the fix from Spark (which they got from Storm) into our YARN client. >>>>>> I think we would add this in org.apache.flink.yarn.Utils# >>>>>> setTokensFor(). >>>>>> >>>>>> Do you want to implement and verify the fix yourself? If you are to >>>>>> busy at the moment, we can also discuss how we share the work (I'm >>>>>> implementing it, you test the fix) >>>>>> >>>>>> >>>>>> Robert >>>>>> >>>>>> On Tue, Nov 3, 2015 at 5:26 PM, Niels Basjes <[email protected]> wrote: >>>>>> >>>>>>> Update on the status so far.... I suspect I found a problem in a >>>>>>> secure setup. >>>>>>> >>>>>>> I have created a very simple Flink topology consisting of a >>>>>>> streaming Source (the outputs the timestamp a few times per second) and >>>>>>> a >>>>>>> Sink (that puts that timestamp into a single record in HBase). >>>>>>> Running this on a non-secure Yarn cluster works fine. >>>>>>> >>>>>>> To run it on a secured Yarn cluster my main routine now looks like >>>>>>> this: >>>>>>> >>>>>>> public static void main(String[] args) throws Exception { >>>>>>> System.setProperty("java.security.krb5.conf", "/etc/krb5.conf"); >>>>>>> UserGroupInformation.loginUserFromKeytab("[email protected]", >>>>>>> "/home/nbasjes/.krb/nbasjes.keytab"); >>>>>>> >>>>>>> final StreamExecutionEnvironment env = >>>>>>> StreamExecutionEnvironment.getExecutionEnvironment(); >>>>>>> env.setParallelism(1); >>>>>>> >>>>>>> DataStream<String> stream = env.addSource(new TimerTicksSource()); >>>>>>> stream.addSink(new SetHBaseRowSink()); >>>>>>> env.execute("Long running Flink application"); >>>>>>> } >>>>>>> >>>>>>> When I run this >>>>>>> flink run -m yarn-cluster -yn 1 -yjm 1024 -ytm 4096 >>>>>>> ./kerberos-1.0-SNAPSHOT.jar >>>>>>> >>>>>>> I see after the startup messages: >>>>>>> >>>>>>> 17:13:24,466 INFO org.apache.hadoop.security.UserGroupInformation >>>>>>> - Login successful for user [email protected] using >>>>>>> keytab file /home/nbasjes/.krb/nbasjes.keytab >>>>>>> 11/03/2015 17:13:25 Job execution switched to status RUNNING. >>>>>>> 11/03/2015 17:13:25 Custom Source -> Stream Sink(1/1) switched to >>>>>>> SCHEDULED >>>>>>> 11/03/2015 17:13:25 Custom Source -> Stream Sink(1/1) switched to >>>>>>> DEPLOYING >>>>>>> 11/03/2015 17:13:25 Custom Source -> Stream Sink(1/1) switched to >>>>>>> RUNNING >>>>>>> >>>>>>> Which looks good. >>>>>>> >>>>>>> However ... no data goes into HBase. >>>>>>> After some digging I found this error in the task managers log: >>>>>>> >>>>>>> 17:13:42,677 WARN org.apache.hadoop.hbase.ipc.RpcClient >>>>>>> - Exception encountered while connecting to the server : >>>>>>> javax.security.sasl.SaslException: GSS initiate failed [Caused by >>>>>>> GSSException: No valid credentials provided (Mechanism level: Failed to >>>>>>> find any Kerberos tgt)] >>>>>>> 17:13:42,677 FATAL org.apache.hadoop.hbase.ipc.RpcClient >>>>>>> - SASL authentication failed. The most likely cause is missing >>>>>>> or invalid credentials. Consider 'kinit'. >>>>>>> javax.security.sasl.SaslException: GSS initiate failed [Caused by >>>>>>> GSSException: No valid credentials provided (Mechanism level: Failed to >>>>>>> find any Kerberos tgt)] >>>>>>> at >>>>>>> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212) >>>>>>> at >>>>>>> org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:177) >>>>>>> at >>>>>>> org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupSaslConnection(RpcClient.java:815) >>>>>>> at >>>>>>> org.apache.hadoop.hbase.ipc.RpcClient$Connection.access$800(RpcClient.java:349) >>>>>>> >>>>>>> >>>>>>> First starting a yarn-session and then loading my job gives the same >>>>>>> error. >>>>>>> >>>>>>> My best guess at this point is that Flink needs the same fix as >>>>>>> described here: >>>>>>> >>>>>>> https://issues.apache.org/jira/browse/SPARK-6918 ( >>>>>>> https://github.com/apache/spark/pull/5586 ) >>>>>>> >>>>>>> What do you guys think? >>>>>>> >>>>>>> Niels Basjes >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Oct 27, 2015 at 6:12 PM, Maximilian Michels <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Niels, >>>>>>>> >>>>>>>> You're welcome. Some more information on how this would be >>>>>>>> configured: >>>>>>>> >>>>>>>> In the kdc.conf, there are two variables: >>>>>>>> >>>>>>>> max_life = 2h 0m 0s >>>>>>>> max_renewable_life = 7d 0h 0m 0s >>>>>>>> >>>>>>>> max_life is the maximum life of the current ticket. However, it may >>>>>>>> be renewed up to a time span of max_renewable_life from the first >>>>>>>> ticket >>>>>>>> issue on. This means that from the first ticket issue, new tickets may >>>>>>>> be >>>>>>>> requested for one week. Each renewed ticket has a life time of >>>>>>>> max_life (2 >>>>>>>> hours in this case). >>>>>>>> >>>>>>>> Please let us know about any difficulties with long-running >>>>>>>> streaming application and Kerberos. >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Max >>>>>>>> >>>>>>>> On Tue, Oct 27, 2015 at 2:46 PM, Niels Basjes <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Thanks for your feedback. >>>>>>>>> So I guess I'll have to talk to the security guys about having >>>>>>>>> special >>>>>>>>> kerberos ticket expiry times for these types of jobs. >>>>>>>>> >>>>>>>>> Niels Basjes >>>>>>>>> >>>>>>>>> On Fri, Oct 23, 2015 at 11:45 AM, Maximilian Michels < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi Niels, >>>>>>>>>> >>>>>>>>>> Thank you for your question. Flink relies entirely on the Kerberos >>>>>>>>>> support of Hadoop. So your question could also be rephrased to >>>>>>>>>> "Does >>>>>>>>>> Hadoop support long-term authentication using Kerberos?". And the >>>>>>>>>> answer is: Yes! >>>>>>>>>> >>>>>>>>>> While Hadoop uses Kerberos tickets to authenticate users with >>>>>>>>>> services >>>>>>>>>> initially, the authentication process continues differently >>>>>>>>>> afterwards. Instead of saving the ticket to authenticate on a >>>>>>>>>> later >>>>>>>>>> access, Hadoop creates its own security tockens (DelegationToken) >>>>>>>>>> that >>>>>>>>>> it passes around. These are authenticated to Kerberos >>>>>>>>>> periodically. To >>>>>>>>>> my knowledge, the tokens have a life span identical to the >>>>>>>>>> Kerberos >>>>>>>>>> ticket maximum life span. So be sure to set the maximum life span >>>>>>>>>> very >>>>>>>>>> high for long streaming jobs. The renewal time, on the other >>>>>>>>>> hand, is >>>>>>>>>> not important because Hadoop abstracts this away using its own >>>>>>>>>> security tockens. >>>>>>>>>> >>>>>>>>>> I'm afraid there is not Kerberos how-to yet. If you are on Yarn, >>>>>>>>>> then >>>>>>>>>> it is sufficient to authenticate the client with Kerberos. On a >>>>>>>>>> Flink >>>>>>>>>> standalone cluster you need to ensure that, initially, all nodes >>>>>>>>>> are >>>>>>>>>> authenticated with Kerberos using the kinit tool. >>>>>>>>>> >>>>>>>>>> Feel free to ask if you have more questions and let us know about >>>>>>>>>> any >>>>>>>>>> difficulties. >>>>>>>>>> >>>>>>>>>> Best regards, >>>>>>>>>> Max >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Oct 22, 2015 at 2:06 PM, Niels Basjes <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> > Hi, >>>>>>>>>> > >>>>>>>>>> > I want to write a long running (i.e. never stop it) streaming >>>>>>>>>> flink >>>>>>>>>> > application on a kerberos secured Hadoop/Yarn cluster. My >>>>>>>>>> application needs >>>>>>>>>> > to do things with files on HDFS and HBase tables on that >>>>>>>>>> cluster so having >>>>>>>>>> > the correct kerberos tickets is very important. The stream is >>>>>>>>>> to be ingested >>>>>>>>>> > from Kafka. >>>>>>>>>> > >>>>>>>>>> > One of the things with Kerberos is that the tickets expire >>>>>>>>>> after a >>>>>>>>>> > predetermined time. My knowledge about kerberos is very limited >>>>>>>>>> so I hope >>>>>>>>>> > you guys can help me. >>>>>>>>>> > >>>>>>>>>> > My question is actually quite simple: Is there an howto >>>>>>>>>> somewhere on how to >>>>>>>>>> > correctly run a long running flink application with kerberos >>>>>>>>>> that includes a >>>>>>>>>> > solution for the kerberos ticket timeout ? >>>>>>>>>> > >>>>>>>>>> > Thanks >>>>>>>>>> > >>>>>>>>>> > Niels Basjes >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Best regards / Met vriendelijke groeten, >>>>>>>>> >>>>>>>>> Niels Basjes >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Best regards / Met vriendelijke groeten, >>>>>>> >>>>>>> Niels Basjes >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Best regards / Met vriendelijke groeten, >>>>> >>>>> Niels Basjes >>>>> >>>> >>>> >> > > > -- > Best regards / Met vriendelijke groeten, > > Niels Basjes >
