I actually figured it out. All i had to do was to add the config:
*spark.yarn.isPython=true
*in the interpreter settings or Zeppelin-env.sh
This is from a PR I came across:
https://github.com/apache/incubator-zeppelin/pull/605

On Thu, May 19, 2016 at 4:15 PM, Paul Bustios Belizario <pbust...@gmail.com>
wrote:

> Hi Udit,
>
> Seems like you are trying to import pyspark. What was the code you tried
> to execute? Could you share it?
>
> Paul
>
>
> On Thu, May 19, 2016 at 7:46 PM Udit Mehta <ume...@groupon.com> wrote:
>
>> Hi All,
>>
>> I keep getting this error when trying to run Pyspark on *Zeppelin 0.5.6*:
>>
>> Py4JJavaError: An error occurred while calling
>>> z:org.apache.spark.api.python.PythonRDD.runJob. :
>>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
>>> in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage
>>> 0.0 (TID 3, worker92.snc1): org.apache.spark.SparkException: Error from
>>> python worker: /usr/local/bin/python2.7: No module named pyspark PYTHONPATH
>>> was:
>>> /data/vol4/nodemanager/usercache/umehta/filecache/29/spark-assembly-1.5.2-hadoop2.6.0.jar
>>> java.io.EOFException
>>>
>>
>> I read online that the solution for this was to set the PythonPath. Here
>> are my settings:
>>
>> System.getenv().get("MASTER")
>>> System.getenv().get("SPARK_YARN_JAR")
>>> System.getenv().get("HADOOP_CONF_DIR")
>>> System.getenv().get("JAVA_HOME")
>>> System.getenv().get("SPARK_HOME")
>>> System.getenv().get("PYSPARK_PYTHON")
>>> System.getenv().get("PYTHONPATH")
>>> System.getenv().get("ZEPPELIN_JAVA_OPTS")
>>>
>>> res0: String = null res1: String = null res2: String = /etc/hadoop/conf
>>> res3: String = null res4: String = /var/umehta/spark-1.5.2 res5: String =
>>> python2.7 res6: String = /var/umehta
>>> /spark-1.5.2/python/lib/py4j-0.8.2.1-src.zip:/var/umehta
>>> /spark-1.5.2/python/:/var/umehta/spark-1.5.2/python:/var/umehta
>>> /spark-1.5.2/python/lib/py4j-0.8.2.1-src.zip:/var/umehta
>>> /spark-1.5.2/python:/var/umehta
>>> /spark-1.5.2/python/lib/py4j-0.8.2.1-src.zip:/var/umehta
>>> /spark-1.5.2/python/lib/py4j-0.8.2.1-src.zip:/var/umehta
>>> /spark-1.5.2/python:/var/umehta/spark-1.5.2/python/build: res7: String
>>> = -Dhdp.version=2.2.0.0-2041
>>
>>
>> And lastly here is my Zeppelin-env.sh:
>>
>> export HADOOP_CONF_DIR=/etc/hadoop/conf
>>> export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.2.0.0-2041"
>>>
>>> export ZEPPELIN_LOG_DIR=/home/umehta/zeppelin-data/logs
>>> export ZEPPELIN_PID_DIR=/home/umehta/zeppelin-data/run
>>> export ZEPPELIN_NOTEBOOK_DIR=/home/umehta/zeppelin-data/notebook
>>> export ZEPPELIN_IDENT_STRING=umehta
>>>
>>> export
>>> ZEPPELIN_CLASSPATH_OVERRIDES=:/usr/hdp/current/share/lzo/0.6.0/lib/*
>>>
>>
>>
>>>
>>>
>>> *export
>>> PYTHONPATH="${SPARK_HOME}/python:${SPARK_HOME}/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH"export
>>> SPARK_YARN_USER_ENV="JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH,LD_LIBRARY_PATH=$LD_LIBRARY_PATH,PYTHONPATH=${PYTHONPATH}"export
>>> PYSPARK_PYTHON=python2.7*
>>
>>
>> Has anyone else faced this issue and have any pointers where I might be
>> going wrong. The problem is only with Pyspark while Spark/Spark Sql run
>> fine.
>>
>> Thanks in advance,
>> Udit
>>
>

Reply via email to