Hi, I've been trying unsuccessfully to configure the pyspark interpreter on Zeppelin. I can use pyspark from the CLI and can use the Spark interpreter from Zeppelin without issue. Here are the lines which aren't commented out in my zeppelin-env.sh file:
export MASTER=yarn-client export ZEPPELIN_PORT=8090 export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.3.2.0-2950 -Dspark.yarn.queue=default" export SPARK_HOME=/usr/hdp/current/spark-client/ export HADOOP_CONF_DIR=/etc/hadoop/conf export PYSPARK_PYTHON=/usr/bin/python export PYTHONPATH=${SPARK_HOME}/python:${SPARK_HOME}/python/build:$PYTHONPATH Running a simple pyspark script in the interpreter gives this error: 1. Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob. 2. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 5, some_yarn_node.networkname): org.apache.spark.SparkException: 3. Error from python worker: 4. /usr/bin/python: No module named pyspark 5. PYTHONPATH was: 6. /app/hadoop/yarn/local/usercache/my_username/filecache/4121/spark-assembly-1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar More details can be found here: https://community.hortonworks.com/questions/16436/cants-get-pyspark-interpreter-to-work-on-zeppelin.html Thanks, Ian