Re: Zeppelin + Spark On EMR?

Eugene Fri, 18 Sep 2015 06:14:47 -0700

Hi Anders,

I also had the error you mention, overcame this with:


   1. using spark installation from zeppelin
   2. altering conf/interpreter.json with properties like
   "spark.executor.instances", "spark.executor.cores",
   "spark.default.parallelism" from spark-defaults.conf, parsed this file
   using parts of your gist.

Code looks like this:

cd ~/zeppelin/conf/
SPARK_DEFAULTS=~/emr-spark-defaults.conf
SPARK_EXECUTOR_INSTANCES=$(grep spark.executor.instances $SPARK_DEFAULTS |
awk '{print $2}')
SPARK_EXECUTOR_CORES=$(grep spark.executor.cores $SPARK_DEFAULTS | awk
'{print $2}')
SPARK_EXECUTOR_MEMORY=$(grep spark.executor.memory $SPARK_DEFAULTS | awk
'{print $2}')
SPARK_DEFAULT_PARALLELISM=$(grep spark.default.parallelism $SPARK_DEFAULTS
| awk '{print $2}')
cat interpreter.json | jq
".interpreterSettings.\"2B188AQ5T\".properties.\"spark.executor.instances\"
= \"${SPARK_EXECUTOR_INSTANCES}\" |
.interpreterSettings.\"2B188AQ5T\".properties.\"spark.executor.cores\" =
\"${SPARK_EXECUTOR_CORES}\" |
.interpreterSettings.\"2B188AQ5T\".properties.\"spark.executor.memory\" =
\"${SPARK_EXECUTOR_MEMORY}\" |
.interpreterSettings.\"2B188AQ5T\".properties.\"spark.default.parallelism\"
= \"${SPARK_DEFAULT_PARALLELISM}\" " > interpreter.json_
cat interpreter.json_ > interpreter.json
rm interpreter.json_


2015-09-18 17:05 GMT+04:00 Anders Hammar <[email protected]>:

> Hi,
>
> Thank you Phil for updating my script to support the latest version of EMR.
> I have edited my gist so that it includes some of your updates plus added
> some other additional changes.
>
> https://gist.github.com/andershammar/224e1077021d0ea376dd
>
> While on the subject, has anyone be able to get Zeppelin to work together
> with the Amazon's Spark installation on Amazon EMR 4.x (by exporting
> SPARK_HOME and HADOOP_HOME instead)? When I try this then I get the
> following exception:
>
> org.apache.spark.SparkException: Found both spark.driver.extraClassPath
> and SPARK_CLASSPATH. Use only the former.
>     at
> org.apache.spark.SparkConf$$anonfun$validateSettings$6$$anonfun$apply$8.apply(SparkConf.scala:444)
>     at
> org.apache.spark.SparkConf$$anonfun$validateSettings$6$$anonfun$apply$8.apply(SparkConf.scala:442)
>     at scala.collection.immutable.List.foreach(List.scala:318)
>     at
> org.apache.spark.SparkConf$$anonfun$validateSettings$6.apply(SparkConf.scala:442)
>     at
> org.apache.spark.SparkConf$$anonfun$validateSettings$6.apply(SparkConf.scala:430)
>     at scala.Option.foreach(Option.scala:236)
>     at org.apache.spark.SparkConf.validateSettings(SparkConf.scala:430)
>     ...
>
> From a quick look at it, the problem seems to be that the Amazon
> installation of Spark use SPARK_CLASSPATH to add additional libraries
> (/etc/spark/conf/spark-env.sh) while the Zeppelin use "spark-submit
> --driver-class-path" (zeppelin/bin/interpreter.sh).
>
> Any ideas?
>
> Best regards,
> Anders
>
>
> On Wed, Sep 9, 2015 at 5:09 PM, Eugene <[email protected]> wrote:
>
>> Here's a bit shorter alternative, too
>>
>> https://gist.github.com/snowindy/008f3e8b878a23c00679
>>
>> 2015-09-09 18:58 GMT+04:00 shahab <[email protected]>:
>>
>>> Thanks Phil, it works. Great job and well done!
>>>
>>> best,
>>> /Shahab
>>>
>>> On Mon, Sep 7, 2015 at 6:32 PM, Phil Wills <[email protected]> wrote:
>>>
>>>> Anders script is a bit out of date if you're using the latest version
>>>> of EMR.  Here's my fork:
>>>>
>>>> https://gist.github.com/philwills/71539f833f57338236b5
>>>>
>>>> which worked OK for me fairly recently.
>>>>
>>>> Phil
>>>>
>>>> On Mon, 7 Sep 2015 at 10:01 shahab <[email protected]> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am trying to use Zeppelin to work with Spark on Amazon EMR. I used
>>>>> the script provided by Anders (
>>>>> https://gist.github.com/andershammar/224e1077021d0ea376dd) to setup
>>>>> Zeppelin. The Zeppelin can connect to Spark but when I got error when I 
>>>>> run
>>>>> the tutorials. and I get the following error:
>>>>>
>>>>> ...FileNotFoundException: File
>>>>> file:/home/hadoop/zeppelin/interpreter/spark/dep/zeppelin-spark-dependencies-0.6.0-incubating-SNAPSHOT.jar
>>>>> does not exist
>>>>>
>>>>> However, the above file does exists in that path on the Master node.'
>>>>>
>>>>> I do appreciate if anyone has any experience to share how to setup
>>>>> Zeppelin with EMR .
>>>>>
>>>>> best,
>>>>> /Shahab
>>>>>
>>>>>
>>>
>>
>>
>> --
>>
>>
>> Best regards,
>> Eugene.
>>
>
>


-- 


Best regards,
Eugene.

Re: Zeppelin + Spark On EMR?

Reply via email to