Re: Pyspark on Zeppelin issue

Paul Bustios Belizario Thu, 19 May 2016 16:16:09 -0700

Hi Udit,

Seems like you are trying to import pyspark. What was the code you tried to
execute? Could you share it?


Paul


On Thu, May 19, 2016 at 7:46 PM Udit Mehta <ume...@groupon.com> wrote:

> Hi All,
>
> I keep getting this error when trying to run Pyspark on *Zeppelin 0.5.6*:
>
> Py4JJavaError: An error occurred while calling
>> z:org.apache.spark.api.python.PythonRDD.runJob. :
>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
>> in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage
>> 0.0 (TID 3, worker92.snc1): org.apache.spark.SparkException: Error from
>> python worker: /usr/local/bin/python2.7: No module named pyspark PYTHONPATH
>> was:
>> /data/vol4/nodemanager/usercache/umehta/filecache/29/spark-assembly-1.5.2-hadoop2.6.0.jar
>> java.io.EOFException
>>
>
> I read online that the solution for this was to set the PythonPath. Here
> are my settings:
>
> System.getenv().get("MASTER")
>> System.getenv().get("SPARK_YARN_JAR")
>> System.getenv().get("HADOOP_CONF_DIR")
>> System.getenv().get("JAVA_HOME")
>> System.getenv().get("SPARK_HOME")
>> System.getenv().get("PYSPARK_PYTHON")
>> System.getenv().get("PYTHONPATH")
>> System.getenv().get("ZEPPELIN_JAVA_OPTS")
>>
>> res0: String = null res1: String = null res2: String = /etc/hadoop/conf
>> res3: String = null res4: String = /var/umehta/spark-1.5.2 res5: String =
>> python2.7 res6: String = /var/umehta
>> /spark-1.5.2/python/lib/py4j-0.8.2.1-src.zip:/var/umehta
>> /spark-1.5.2/python/:/var/umehta/spark-1.5.2/python:/var/umehta
>> /spark-1.5.2/python/lib/py4j-0.8.2.1-src.zip:/var/umehta
>> /spark-1.5.2/python:/var/umehta
>> /spark-1.5.2/python/lib/py4j-0.8.2.1-src.zip:/var/umehta
>> /spark-1.5.2/python/lib/py4j-0.8.2.1-src.zip:/var/umehta
>> /spark-1.5.2/python:/var/umehta/spark-1.5.2/python/build: res7: String =
>> -Dhdp.version=2.2.0.0-2041
>
>
> And lastly here is my Zeppelin-env.sh:
>
> export HADOOP_CONF_DIR=/etc/hadoop/conf
>> export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.2.0.0-2041"
>>
>> export ZEPPELIN_LOG_DIR=/home/umehta/zeppelin-data/logs
>> export ZEPPELIN_PID_DIR=/home/umehta/zeppelin-data/run
>> export ZEPPELIN_NOTEBOOK_DIR=/home/umehta/zeppelin-data/notebook
>> export ZEPPELIN_IDENT_STRING=umehta
>>
>> export
>> ZEPPELIN_CLASSPATH_OVERRIDES=:/usr/hdp/current/share/lzo/0.6.0/lib/*
>>
>
>
>>
>>
>> *export
>> PYTHONPATH="${SPARK_HOME}/python:${SPARK_HOME}/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH"export
>> SPARK_YARN_USER_ENV="JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH,LD_LIBRARY_PATH=$LD_LIBRARY_PATH,PYTHONPATH=${PYTHONPATH}"export
>> PYSPARK_PYTHON=python2.7*
>
>
> Has anyone else faced this issue and have any pointers where I might be
> going wrong. The problem is only with Pyspark while Spark/Spark Sql run
> fine.
>
> Thanks in advance,
> Udit
>

Re: Pyspark on Zeppelin issue

Reply via email to