Hi,

I've been trying unsuccessfully to configure the pyspark interpreter on
Zeppelin. I can use pyspark from the CLI and can use the Spark interpreter
from Zeppelin without issue. Here are the lines which aren't commented out
in my zeppelin-env.sh file:

export MASTER=yarn-client

export ZEPPELIN_PORT=8090

export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.3.2.0-2950
-Dspark.yarn.queue=default"

export SPARK_HOME=/usr/hdp/current/spark-client/

export HADOOP_CONF_DIR=/etc/hadoop/conf

export PYSPARK_PYTHON=/usr/bin/python

export
PYTHONPATH=${SPARK_HOME}/python:${SPARK_HOME}/python/build:$PYTHONPATH

Running a simple pyspark script in the interpreter gives this error:

  1.  Py4JJavaError: An error occurred while calling
z:org.apache.spark.api.python.PythonRDD.runJob.
  2.  : org.apache.spark.SparkException: Job aborted due to stage failure:
Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in
stage 1.0 (TID 5, some_yarn_node.networkname):
org.apache.spark.SparkException:
  3.  Error from python worker:
  4.    /usr/bin/python: No module named pyspark
  5.  PYTHONPATH was:
  6.
/app/hadoop/yarn/local/usercache/my_username/filecache/4121/spark-assembly-1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar

More details can be found here:
https://community.hortonworks.com/questions/16436/cants-get-pyspark-interpreter-to-work-on-zeppelin.html

Thanks,

Ian

Reply via email to