CS: What you’re doing is compiling two versions of Zeppelin from source on top of a binary of a third version. That’s going to give you trouble.
The R Interpreter you’re using doesn’t interface with Zeppelin’s spark installation at all. All it shares is the name. So, none of the things you’ve been doing, with recompiling Zeppelin or Spark or whatever, is actually having any impact on R working with hive. R working or not working, for you, with hive, is incidental. I suggest you start from a clean installation and install this https://github.com/elbamos/Zeppelin-With-R from source. You should not need to specify -Pyarn, -Phive, etc. etc. The R interpreter in the package will use the same Spark as the rest of Zeppelin. Just mvn package install -DskipTests to install. At runtime, set the environment variable SPARK_HOME to point to your existing, separately compiled, installation of Spark. Zeppelin should try to use Hive by default, and the R interpreter will use whatever the rest of Zeppelin uses. Also — @FelixCheung, you have no business trying to provide support for anyone on this project, and you certainly have no business giving anyone advice about using R with it. From: cs user <acldstk...@gmail.com> Reply: users@zeppelin.incubator.apache.org <users@zeppelin.incubator.apache.org> Date: December 16, 2015 at 5:27:20 AM To: users@zeppelin.incubator.apache.org <users@zeppelin.incubator.apache.org> Subject: Re: Zeppelin+spark+R+hive Hi All, Many thanks for getting back to me. I've managed to get this working by downloading the tagged spark 1.5.2 release and compiling it with: ./make-distribution.sh --name custom-spark --tgz -Phadoop-2.6 -Dhadoop.version=2.6.0 -Pyarn -Phive -Phive-thriftserver -Psparkr I've then downloaded the source for this version of zeppelin: https://github.com/datalayer/zeppelin-R Then compiled it with (based on the readme from the above project): mvn clean install -Pyarn -Pspark-1.5 -Dspark.version=1.5.2 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Ppyspark -Dmaven.findbugs.enable=false -Drat.skip=true -Dcheckstyle.skip=true -DskipTests -pl '!flink,!ignite,!phoenix,!postgresql,!tajo,!hive,!cassandra,!lens,!kylin' Within Zeppelin this allows spark to run with yarn, as well as the ability to use the R interpreter with hive. Hope this helps someone else :-) Cheers! On Tue, Dec 15, 2015 at 5:37 PM, Sourav Mazumder <sourav.mazumde...@gmail.com> wrote: I believe that is not going to solve the problem. If you need to run spark on Yarn (assuming that it is your requirement) ensure that you run it in Yarn Client mode. Yarn Clustre mode is not supported with Zeppelin yet. Regards, Sourav On Tue, Dec 15, 2015 at 9:32 AM, Felix Cheung <felixcheun...@hotmail.com> wrote: If you are not using YARN, try building your Spark distribution without this: -Pyarn ? On Tue, Dec 15, 2015 at 12:31 AM -0800, "cs user" <acldstk...@gmail.com> wrote: Hi Folks, We've been playing around with this project: https://github.com/datalayer/zeppelin-R However when we try and write a notebook using R which requires hive, we run into the following: Error in value[[3L]](cond): Spark SQL is not built with Hive support This is when we are using the pre compiled spark with hadoop 2.6 support. To work around this, I've tried recompiling spark with hive support. Accessing the hive context within an R notebook now works fine. However, it is then impossible to run existing notebooks which try to submit jobs via yarn, the following error is encountered: java.lang.NoSuchMethodException: org.apache.spark.repl.SparkILoop$SparkILoopInterpreter.classServerUri() at java.lang.Class.getMethod(Class.java:1678) at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:271) at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:145) at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:464) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:292) at org.apache.zeppelin.scheduler.Job.run(Job.java:170) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:118) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) If I switch back to the old spark home, these jobs then work fine again. I am compiling our custom version of spark with the following: ./make-distribution.sh --name custom-spark --tgz -Phadoop-2.6 -Dhadoop.version=2.6.0 -Pyarn -Phive -Phive-thriftserver Are there any other switches I need to add to overcome the above error? Thanks!