Re: Zeppelin+spark+R+hive

Amos B. Elberg Wed, 16 Dec 2015 09:31:07 -0800

CS:   What you’re doing is compiling two versions of Zeppelin from source on 
top of a binary of a third version.  That’s going to give you trouble.


The R Interpreter you’re using doesn’t interface with Zeppelin’s spark 
installation at all.  All it shares is the name.  So, none of the things you’ve 
been doing, with recompiling Zeppelin or Spark or whatever, is actually having 
any impact on R working with hive.  R working or not working, for you, with 
hive, is incidental.   

I suggest you start from a clean installation and install this 
https://github.com/elbamos/Zeppelin-With-R from source. 

You should not need to specify -Pyarn, -Phive, etc. etc.   The R interpreter in 
the package will use the same Spark as the rest of Zeppelin. 

Just mvn package install -DskipTests to install.  

At runtime, set the environment variable SPARK_HOME to point to your existing, 
separately compiled, installation of Spark.  Zeppelin should try to use Hive by 
default, and the R interpreter will use whatever the rest of Zeppelin uses. 

Also — @FelixCheung, you have no business trying to provide support for anyone 
on this project, and you certainly have no business giving anyone advice about 
using R with it. 


From: cs user <acldstk...@gmail.com>
Reply: users@zeppelin.incubator.apache.org <users@zeppelin.incubator.apache.org>
Date: December 16, 2015 at 5:27:20 AM
To: users@zeppelin.incubator.apache.org <users@zeppelin.incubator.apache.org>
Subject:  Re: Zeppelin+spark+R+hive  

Hi All, 

Many thanks for getting back to me. I've managed to get this working by 
downloading the tagged spark 1.5.2 release and compiling it with:

./make-distribution.sh --name custom-spark --tgz -Phadoop-2.6 
-Dhadoop.version=2.6.0 -Pyarn -Phive -Phive-thriftserver -Psparkr

I've then downloaded the source for this version of zeppelin:

https://github.com/datalayer/zeppelin-R

Then compiled it with (based on the readme from the above project):

mvn clean install -Pyarn -Pspark-1.5 -Dspark.version=1.5.2 
-Dhadoop.version=2.6.0 -Phadoop-2.6 -Ppyspark -Dmaven.findbugs.enable=false 
-Drat.skip=true -Dcheckstyle.skip=true -DskipTests -pl 
'!flink,!ignite,!phoenix,!postgresql,!tajo,!hive,!cassandra,!lens,!kylin'

Within Zeppelin this allows spark to run with yarn, as well as the ability to 
use the R interpreter with hive.  

Hope this helps someone else :-)

Cheers!











On Tue, Dec 15, 2015 at 5:37 PM, Sourav Mazumder <sourav.mazumde...@gmail.com> 
wrote:
I believe that is not going to solve the problem.

If you need to run spark on Yarn (assuming that it is your requirement) ensure 
that you run it in Yarn Client mode. Yarn Clustre mode is not supported with 
Zeppelin yet.

Regards,
Sourav


On Tue, Dec 15, 2015 at 9:32 AM, Felix Cheung <felixcheun...@hotmail.com> wrote:
If you are not using YARN, try building your Spark distribution without this: 
 -Pyarn 
?



On Tue, Dec 15, 2015 at 12:31 AM -0800, "cs user" <acldstk...@gmail.com> wrote:

Hi Folks, 

We've been playing around with this project:

https://github.com/datalayer/zeppelin-R

However when we try and write a notebook using R which requires hive, we run 
into the following:

Error in value[[3L]](cond): Spark SQL is not built with Hive support

This is when we are using the pre compiled spark with hadoop 2.6 support. 

To work around this, I've tried recompiling spark with hive support. Accessing 
the hive context within an R notebook now works fine. 

However, it is then impossible to run existing notebooks which try to submit 
jobs via yarn, the following error is encountered:


java.lang.NoSuchMethodException:
org.apache.spark.repl.SparkILoop$SparkILoopInterpreter.classServerUri()
at java.lang.Class.getMethod(Class.java:1678) at
org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:271)
at
org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:145)
at
org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:464)
at
org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:292)
at org.apache.zeppelin.scheduler.Job.run(Job.java:170) at
org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:118)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


If I switch back to the old spark home, these jobs then work fine again. 

I am compiling our custom version of spark with the following:

./make-distribution.sh --name custom-spark --tgz -Phadoop-2.6 
-Dhadoop.version=2.6.0 -Pyarn -Phive -Phive-thriftserver

Are there any other switches I need to add to overcome the above error?

Thanks!

Re: Zeppelin+spark+R+hive

Reply via email to