Re: Zeppelin+spark+R+hive

Corneau Damien Wed, 16 Dec 2015 17:10:47 -0800

First of all, I would start by saying that if you have trouble with
https://github.com/datalayer/zeppelin-R
You better ask directly on that repository or ask echarles for some help
since it isn't part of the https://github.com/apache/incubator-zeppelin
repository.
I think he will be able to give you better answers regarding your error.


@Amos
Zeppelin is an Open Source project, and we welcome any type of
contributions, including helping on the mailing list since we can't answer
every thread.
Your remark about @FelixCheung has nothing to do here, and it doesn't help
resolving @csuser issue.
Furthermore, you shouldn't have to tell him what to do or not to do, your
grudges whatever it is has nothing to do in this mailing list.

On Thu, Dec 17, 2015 at 2:30 AM, Amos B. Elberg <amos.elb...@me.com> wrote:

> CS:   What you’re doing is compiling two versions of Zeppelin from source
> on top of a binary of a third version.  That’s going to give you trouble.
>
> The R Interpreter you’re using doesn’t interface with Zeppelin’s spark
> installation at all.  All it shares is the name.  So, none of the things
> you’ve been doing, with recompiling Zeppelin or Spark or whatever, is
> actually having any impact on R working with hive.  R working or not
> working, for you, with hive, is incidental.
>
> I suggest you start from a clean installation and install this
> https://github.com/elbamos/Zeppelin-With-R from source.
>
> You should not need to specify -Pyarn, -Phive, etc. etc.   The R
> interpreter in the package will use the same Spark as the rest of Zeppelin.
>
> Just mvn package install -DskipTests to install.
>
> At runtime, set the environment variable SPARK_HOME to point to your
> existing, separately compiled, installation of Spark.  Zeppelin should try
> to use Hive by default, and the R interpreter will use whatever the rest of
> Zeppelin uses.
>
> Also — @FelixCheung, you have no business trying to provide support for
> anyone on this project, and you certainly have no business giving anyone
> advice about using R with it.
>
>
> From: cs user <acldstk...@gmail.com> <acldstk...@gmail.com>
> Reply: users@zeppelin.incubator.apache.org
> <users@zeppelin.incubator.apache.org>
> <users@zeppelin.incubator.apache.org>
> Date: December 16, 2015 at 5:27:20 AM
> To: users@zeppelin.incubator.apache.org
> <users@zeppelin.incubator.apache.org>
> <users@zeppelin.incubator.apache.org>
> Subject:  Re: Zeppelin+spark+R+hive
>
> Hi All,
>
> Many thanks for getting back to me. I've managed to get this working by
> downloading the tagged spark 1.5.2 release and compiling it with:
>
> ./make-distribution.sh --name custom-spark --tgz -Phadoop-2.6
> -Dhadoop.version=2.6.0 -Pyarn -Phive -Phive-thriftserver -Psparkr
>
> I've then downloaded the source for this version of zeppelin:
>
> https://github.com/datalayer/zeppelin-R
>
> Then compiled it with (based on the readme from the above project):
>
> mvn clean install -Pyarn -Pspark-1.5 -Dspark.version=1.5.2
> -Dhadoop.version=2.6.0 -Phadoop-2.6 -Ppyspark -Dmaven.findbugs.enable=false
> -Drat.skip=true -Dcheckstyle.skip=true -DskipTests -pl
> '!flink,!ignite,!phoenix,!postgresql,!tajo,!hive,!cassandra,!lens,!kylin'
>
> Within Zeppelin this allows spark to run with yarn, as well as the ability
> to use the R interpreter with hive.
>
> Hope this helps someone else :-)
>
> Cheers!
>
>
>
>
>
>
>
>
>
>
>
> On Tue, Dec 15, 2015 at 5:37 PM, Sourav Mazumder <
> sourav.mazumde...@gmail.com> wrote:
>
>> I believe that is not going to solve the problem.
>>
>> If you need to run spark on Yarn (assuming that it is your requirement)
>> ensure that you run it in Yarn Client mode. Yarn Clustre mode is not
>> supported with Zeppelin yet.
>>
>> Regards,
>> Sourav
>>
>>
>> On Tue, Dec 15, 2015 at 9:32 AM, Felix Cheung <felixcheun...@hotmail.com>
>> wrote:
>>
>>> If you are not using YARN, try building your Spark distribution without
>>> this:
>>>  -Pyarn
>>> ?
>>>
>>>
>>>
>>> On Tue, Dec 15, 2015 at 12:31 AM -0800, "cs user" <acldstk...@gmail.com>
>>> wrote:
>>>
>>> Hi Folks,
>>>
>>> We've been playing around with this project:
>>>
>>> https://github.com/datalayer/zeppelin-R
>>>
>>> However when we try and write a notebook using R which requires hive, we
>>> run into the following:
>>>
>>> Error in value[[3L]](cond): Spark SQL is not built with Hive support
>>>
>>> This is when we are using the pre compiled spark with hadoop 2.6
>>> support.
>>>
>>> To work around this, I've tried recompiling spark with hive support.
>>> Accessing the hive context within an R notebook now works fine.
>>>
>>> However, it is then impossible to run existing notebooks which try to
>>> submit jobs via yarn, the following error is encountered:
>>>
>>> java.lang.NoSuchMethodException:
>>> org.apache.spark.repl.SparkILoop$SparkILoopInterpreter.classServerUri() at
>>> java.lang.Class.getMethod(Class.java:1678) at
>>> org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:271)
>>> at
>>> org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:145)
>>> at
>>> org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:464)
>>> at
>>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
>>> at
>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
>>> at
>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92)
>>> at
>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:292)
>>> at org.apache.zeppelin.scheduler.Job.run(Job.java:170) at
>>> org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:118)
>>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>>> at
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>>
>>> If I switch back to the old spark home, these jobs then work fine again.
>>>
>>> I am compiling our custom version of spark with the following:
>>>
>>> ./make-distribution.sh --name custom-spark --tgz -Phadoop-2.6
>>> -Dhadoop.version=2.6.0 -Pyarn -Phive -Phive-thriftserver
>>>
>>> Are there any other switches I need to add to overcome the above error?
>>>
>>> Thanks!
>>>
>>>
>>>
>>
>

Re: Zeppelin+spark+R+hive

Reply via email to