Hi All, I keep getting this error when trying to run Pyspark on *Zeppelin 0.5.6*:
Py4JJavaError: An error occurred while calling > z:org.apache.spark.api.python.PythonRDD.runJob. : > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 > in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage > 0.0 (TID 3, worker92.snc1): org.apache.spark.SparkException: Error from > python worker: /usr/local/bin/python2.7: No module named pyspark PYTHONPATH > was: > /data/vol4/nodemanager/usercache/umehta/filecache/29/spark-assembly-1.5.2-hadoop2.6.0.jar > java.io.EOFException > I read online that the solution for this was to set the PythonPath. Here are my settings: System.getenv().get("MASTER") > System.getenv().get("SPARK_YARN_JAR") > System.getenv().get("HADOOP_CONF_DIR") > System.getenv().get("JAVA_HOME") > System.getenv().get("SPARK_HOME") > System.getenv().get("PYSPARK_PYTHON") > System.getenv().get("PYTHONPATH") > System.getenv().get("ZEPPELIN_JAVA_OPTS") > > res0: String = null res1: String = null res2: String = /etc/hadoop/conf > res3: String = null res4: String = /var/umehta/spark-1.5.2 res5: String = > python2.7 res6: String = /var/umehta > /spark-1.5.2/python/lib/py4j-0.8.2.1-src.zip:/var/umehta > /spark-1.5.2/python/:/var/umehta/spark-1.5.2/python:/var/umehta > /spark-1.5.2/python/lib/py4j-0.8.2.1-src.zip:/var/umehta > /spark-1.5.2/python:/var/umehta > /spark-1.5.2/python/lib/py4j-0.8.2.1-src.zip:/var/umehta > /spark-1.5.2/python/lib/py4j-0.8.2.1-src.zip:/var/umehta > /spark-1.5.2/python:/var/umehta/spark-1.5.2/python/build: res7: String = > -Dhdp.version=2.2.0.0-2041 And lastly here is my Zeppelin-env.sh: export HADOOP_CONF_DIR=/etc/hadoop/conf > export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.2.0.0-2041" > > export ZEPPELIN_LOG_DIR=/home/umehta/zeppelin-data/logs > export ZEPPELIN_PID_DIR=/home/umehta/zeppelin-data/run > export ZEPPELIN_NOTEBOOK_DIR=/home/umehta/zeppelin-data/notebook > export ZEPPELIN_IDENT_STRING=umehta > > export ZEPPELIN_CLASSPATH_OVERRIDES=:/usr/hdp/current/share/lzo/0.6.0/lib/* > > > > *export > PYTHONPATH="${SPARK_HOME}/python:${SPARK_HOME}/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH"export > SPARK_YARN_USER_ENV="JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH,LD_LIBRARY_PATH=$LD_LIBRARY_PATH,PYTHONPATH=${PYTHONPATH}"export > PYSPARK_PYTHON=python2.7* Has anyone else faced this issue and have any pointers where I might be going wrong. The problem is only with Pyspark while Spark/Spark Sql run fine. Thanks in advance, Udit