Scalding with Zeppelin 0.6.2: Job Failed, Null user

Benoit Hanotte Fri, 02 Dec 2016 02:05:57 -0800

Hello all,

I am using zeppelin to use spark in yarn-client mode on my company's
cluster and it works great. However I am now trying to run scalding in hdfs
mode on the same cluster but the jobs  always fail with the following error
logged:


16/12/02 10:54:27 DEBUG UserGroupInformation: hadoop login
16/12/02 10:54:27 DEBUG UserGroupInformation: hadoop login commit
16/12/02 10:54:27 DEBUG UserGroupInformation: Using user: "b.hanotte" with
name b.hanotte
16/12/02 10:54:27 DEBUG UserGroupInformation: User entry: "b.hanotte"
16/12/02 10:54:27 DEBUG UserGroupInformation: UGI loginUser:b.hanotte
(auth:SIMPLE)
16/12/02 10:54:27 ERROR Job: Job failed
java.lang.IllegalArgumentException: Null user
at
org.apache.hadoop.security.UserGroupInformation.createProxyUser(UserGroupInformation.java:1290)
at
org.apache.zeppelin.scalding.ScaldingInterpreter.interpret(ScaldingInterpreter.java:139)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:94)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:341)
at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


The username is valid (b.hanotte) but the auth method (SIMPLE in the log)
should be Kerberos I believe.

The configuration in /etc/hadoop/conf iscorrect and Spark can use it to
connect to the cluster. I did set the following environment variables:

export HADOOP_CONF_DIR=/etc/hadoop/conf
export HADOOP_HOME=/usr/local/lib/hadoop-2.6
export HADOOP_USER_NAME=b.hanotte # necessary for scalding
export HADOOP_OPTS="-Djava.security.krb5.conf=/etc/krb5.conf"
export YARN_OPTS="-Djava.security.krb5.conf=/etc/krb5.conf"
export
HADOOP_CLASSPATH="/criteo/scalding/scalding-repl-assembly-0.16.0-RC5.jar"
export
ZEPPELIN_CLASSPATH_OVERRIDES="$HADOOP_CLASSPATH;$ZEPPELIN_CLASSPATH_OVERRIDES"

Is there some configuration that I am missing?

Thanks!

Benoit

Scalding with Zeppelin 0.6.2: Job Failed, Null user

Reply via email to