I actually figured it out. All i had to do was to add the config: *spark.yarn.isPython=true *in the interpreter settings or Zeppelin-env.sh This is from a PR I came across: https://github.com/apache/incubator-zeppelin/pull/605
On Thu, May 19, 2016 at 4:15 PM, Paul Bustios Belizario <pbust...@gmail.com> wrote: > Hi Udit, > > Seems like you are trying to import pyspark. What was the code you tried > to execute? Could you share it? > > Paul > > > On Thu, May 19, 2016 at 7:46 PM Udit Mehta <ume...@groupon.com> wrote: > >> Hi All, >> >> I keep getting this error when trying to run Pyspark on *Zeppelin 0.5.6*: >> >> Py4JJavaError: An error occurred while calling >>> z:org.apache.spark.api.python.PythonRDD.runJob. : >>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 >>> in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage >>> 0.0 (TID 3, worker92.snc1): org.apache.spark.SparkException: Error from >>> python worker: /usr/local/bin/python2.7: No module named pyspark PYTHONPATH >>> was: >>> /data/vol4/nodemanager/usercache/umehta/filecache/29/spark-assembly-1.5.2-hadoop2.6.0.jar >>> java.io.EOFException >>> >> >> I read online that the solution for this was to set the PythonPath. Here >> are my settings: >> >> System.getenv().get("MASTER") >>> System.getenv().get("SPARK_YARN_JAR") >>> System.getenv().get("HADOOP_CONF_DIR") >>> System.getenv().get("JAVA_HOME") >>> System.getenv().get("SPARK_HOME") >>> System.getenv().get("PYSPARK_PYTHON") >>> System.getenv().get("PYTHONPATH") >>> System.getenv().get("ZEPPELIN_JAVA_OPTS") >>> >>> res0: String = null res1: String = null res2: String = /etc/hadoop/conf >>> res3: String = null res4: String = /var/umehta/spark-1.5.2 res5: String = >>> python2.7 res6: String = /var/umehta >>> /spark-1.5.2/python/lib/py4j-0.8.2.1-src.zip:/var/umehta >>> /spark-1.5.2/python/:/var/umehta/spark-1.5.2/python:/var/umehta >>> /spark-1.5.2/python/lib/py4j-0.8.2.1-src.zip:/var/umehta >>> /spark-1.5.2/python:/var/umehta >>> /spark-1.5.2/python/lib/py4j-0.8.2.1-src.zip:/var/umehta >>> /spark-1.5.2/python/lib/py4j-0.8.2.1-src.zip:/var/umehta >>> /spark-1.5.2/python:/var/umehta/spark-1.5.2/python/build: res7: String >>> = -Dhdp.version=2.2.0.0-2041 >> >> >> And lastly here is my Zeppelin-env.sh: >> >> export HADOOP_CONF_DIR=/etc/hadoop/conf >>> export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.2.0.0-2041" >>> >>> export ZEPPELIN_LOG_DIR=/home/umehta/zeppelin-data/logs >>> export ZEPPELIN_PID_DIR=/home/umehta/zeppelin-data/run >>> export ZEPPELIN_NOTEBOOK_DIR=/home/umehta/zeppelin-data/notebook >>> export ZEPPELIN_IDENT_STRING=umehta >>> >>> export >>> ZEPPELIN_CLASSPATH_OVERRIDES=:/usr/hdp/current/share/lzo/0.6.0/lib/* >>> >> >> >>> >>> >>> *export >>> PYTHONPATH="${SPARK_HOME}/python:${SPARK_HOME}/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH"export >>> SPARK_YARN_USER_ENV="JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH,LD_LIBRARY_PATH=$LD_LIBRARY_PATH,PYTHONPATH=${PYTHONPATH}"export >>> PYSPARK_PYTHON=python2.7* >> >> >> Has anyone else faced this issue and have any pointers where I might be >> going wrong. The problem is only with Pyspark while Spark/Spark Sql run >> fine. >> >> Thanks in advance, >> Udit >> >