Hello all, I am using zeppelin to use spark in yarn-client mode on my company's cluster and it works great. However I am now trying to run scalding in hdfs mode on the same cluster but the jobs always fail with the following error logged:
16/12/02 10:54:27 DEBUG UserGroupInformation: hadoop login 16/12/02 10:54:27 DEBUG UserGroupInformation: hadoop login commit 16/12/02 10:54:27 DEBUG UserGroupInformation: Using user: "b.hanotte" with name b.hanotte 16/12/02 10:54:27 DEBUG UserGroupInformation: User entry: "b.hanotte" 16/12/02 10:54:27 DEBUG UserGroupInformation: UGI loginUser:b.hanotte (auth:SIMPLE) 16/12/02 10:54:27 ERROR Job: Job failed java.lang.IllegalArgumentException: Null user at org.apache.hadoop.security.UserGroupInformation.createProxyUser(UserGroupInformation.java:1290) at org.apache.zeppelin.scalding.ScaldingInterpreter.interpret(ScaldingInterpreter.java:139) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:94) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:341) at org.apache.zeppelin.scheduler.Job.run(Job.java:176) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) The username is valid (b.hanotte) but the auth method (SIMPLE in the log) should be Kerberos I believe. The configuration in /etc/hadoop/conf iscorrect and Spark can use it to connect to the cluster. I did set the following environment variables: export HADOOP_CONF_DIR=/etc/hadoop/conf export HADOOP_HOME=/usr/local/lib/hadoop-2.6 export HADOOP_USER_NAME=b.hanotte # necessary for scalding export HADOOP_OPTS="-Djava.security.krb5.conf=/etc/krb5.conf" export YARN_OPTS="-Djava.security.krb5.conf=/etc/krb5.conf" export HADOOP_CLASSPATH="/criteo/scalding/scalding-repl-assembly-0.16.0-RC5.jar" export ZEPPELIN_CLASSPATH_OVERRIDES="$HADOOP_CLASSPATH;$ZEPPELIN_CLASSPATH_OVERRIDES" Is there some configuration that I am missing? Thanks! Benoit