Hi Udit, Seems like you are trying to import pyspark. What was the code you tried to execute? Could you share it?
Paul On Thu, May 19, 2016 at 7:46 PM Udit Mehta <ume...@groupon.com> wrote: > Hi All, > > I keep getting this error when trying to run Pyspark on *Zeppelin 0.5.6*: > > Py4JJavaError: An error occurred while calling >> z:org.apache.spark.api.python.PythonRDD.runJob. : >> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 >> in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage >> 0.0 (TID 3, worker92.snc1): org.apache.spark.SparkException: Error from >> python worker: /usr/local/bin/python2.7: No module named pyspark PYTHONPATH >> was: >> /data/vol4/nodemanager/usercache/umehta/filecache/29/spark-assembly-1.5.2-hadoop2.6.0.jar >> java.io.EOFException >> > > I read online that the solution for this was to set the PythonPath. Here > are my settings: > > System.getenv().get("MASTER") >> System.getenv().get("SPARK_YARN_JAR") >> System.getenv().get("HADOOP_CONF_DIR") >> System.getenv().get("JAVA_HOME") >> System.getenv().get("SPARK_HOME") >> System.getenv().get("PYSPARK_PYTHON") >> System.getenv().get("PYTHONPATH") >> System.getenv().get("ZEPPELIN_JAVA_OPTS") >> >> res0: String = null res1: String = null res2: String = /etc/hadoop/conf >> res3: String = null res4: String = /var/umehta/spark-1.5.2 res5: String = >> python2.7 res6: String = /var/umehta >> /spark-1.5.2/python/lib/py4j-0.8.2.1-src.zip:/var/umehta >> /spark-1.5.2/python/:/var/umehta/spark-1.5.2/python:/var/umehta >> /spark-1.5.2/python/lib/py4j-0.8.2.1-src.zip:/var/umehta >> /spark-1.5.2/python:/var/umehta >> /spark-1.5.2/python/lib/py4j-0.8.2.1-src.zip:/var/umehta >> /spark-1.5.2/python/lib/py4j-0.8.2.1-src.zip:/var/umehta >> /spark-1.5.2/python:/var/umehta/spark-1.5.2/python/build: res7: String = >> -Dhdp.version=2.2.0.0-2041 > > > And lastly here is my Zeppelin-env.sh: > > export HADOOP_CONF_DIR=/etc/hadoop/conf >> export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.2.0.0-2041" >> >> export ZEPPELIN_LOG_DIR=/home/umehta/zeppelin-data/logs >> export ZEPPELIN_PID_DIR=/home/umehta/zeppelin-data/run >> export ZEPPELIN_NOTEBOOK_DIR=/home/umehta/zeppelin-data/notebook >> export ZEPPELIN_IDENT_STRING=umehta >> >> export >> ZEPPELIN_CLASSPATH_OVERRIDES=:/usr/hdp/current/share/lzo/0.6.0/lib/* >> > > >> >> >> *export >> PYTHONPATH="${SPARK_HOME}/python:${SPARK_HOME}/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH"export >> SPARK_YARN_USER_ENV="JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH,LD_LIBRARY_PATH=$LD_LIBRARY_PATH,PYTHONPATH=${PYTHONPATH}"export >> PYSPARK_PYTHON=python2.7* > > > Has anyone else faced this issue and have any pointers where I might be > going wrong. The problem is only with Pyspark while Spark/Spark Sql run > fine. > > Thanks in advance, > Udit >