Niels Basjes created FLINK-2977: ----------------------------------- Summary: Cannot access HBase in a Kerberos secured Yarn cluster Key: FLINK-2977 URL: https://issues.apache.org/jira/browse/FLINK-2977 Project: Flink Issue Type: Bug Components: YARN Client Reporter: Niels Basjes
I have created a very simple Flink topology consisting of a streaming Source (the outputs the timestamp a few times per second) and a Sink (that puts that timestamp into a single record in HBase). Running this on a non-secure Yarn cluster works fine. To run it on a secured Yarn cluster my main routine now looks like this: {code} public static void main(String[] args) throws Exception { System.setProperty("java.security.krb5.conf", "/etc/krb5.conf"); UserGroupInformation.loginUserFromKeytab("nbas...@xxxxxx.net", "/home/nbasjes/.krb/nbasjes.keytab"); final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setParallelism(1); DataStream<String> stream = env.addSource(new TimerTicksSource()); stream.addSink(new SetHBaseRowSink()); env.execute("Long running Flink application"); } {code} When I run this flink run -m yarn-cluster -yn 1 -yjm 1024 -ytm 4096 ./kerberos-1.0-SNAPSHOT.jar I see after the startup messages: {quote} 17:13:24,466 INFO org.apache.hadoop.security.UserGroupInformation - Login successful for user nbas...@xxxxxx.net using keytab file /home/nbasjes/.krb/nbasjes.keytab 11/03/2015 17:13:25 Job execution switched to status RUNNING. 11/03/2015 17:13:25 Custom Source -> Stream Sink(1/1) switched to SCHEDULED 11/03/2015 17:13:25 Custom Source -> Stream Sink(1/1) switched to DEPLOYING 11/03/2015 17:13:25 Custom Source -> Stream Sink(1/1) switched to RUNNING {quote} Which looks good. However ... no data goes into HBase. After some digging I found this error in the task managers log: {quote} 17:13:42,677 WARN org.apache.hadoop.hbase.ipc.RpcClient - Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 17:13:42,677 FATAL org.apache.hadoop.hbase.ipc.RpcClient - SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'. javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212) at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:177) at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupSaslConnection(RpcClient.java:815) at org.apache.hadoop.hbase.ipc.RpcClient$Connection.access$800(RpcClient.java:349) {quote} First starting a yarn-session and then loading my job gives the same error. My best guess at this point is that Flink needs the same fix as described here: https://issues.apache.org/jira/browse/SPARK-6918 ( https://github.com/apache/spark/pull/5586 ) -- This message was sent by Atlassian JIRA (v6.3.4#6332)