Zeppelin+spark+R+hive

cs user Tue, 15 Dec 2015 00:32:11 -0800

Hi Folks,

We've been playing around with this project:


https://github.com/datalayer/zeppelin-R

However when we try and write a notebook using R which requires hive, we
run into the following:

Error in value[[3L]](cond): Spark SQL is not built with Hive support

This is when we are using the pre compiled spark with hadoop 2.6 support.

To work around this, I've tried recompiling spark with hive support.
Accessing the hive context within an R notebook now works fine.

However, it is then impossible to run existing notebooks which try to
submit jobs via yarn, the following error is encountered:

java.lang.NoSuchMethodException:
org.apache.spark.repl.SparkILoop$SparkILoopInterpreter.classServerUri() at
java.lang.Class.getMethod(Class.java:1678) at
org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:271)
at
org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:145)
at
org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:464)
at
org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:292)
at org.apache.zeppelin.scheduler.Job.run(Job.java:170) at
org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:118)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

If I switch back to the old spark home, these jobs then work fine again.

I am compiling our custom version of spark with the following:

./make-distribution.sh --name custom-spark --tgz -Phadoop-2.6
-Dhadoop.version=2.6.0 -Pyarn -Phive -Phive-thriftserver

Are there any other switches I need to add to overcome the above error?

Thanks!

Zeppelin+spark+R+hive

Reply via email to