Jonathan Esterhazy created ZEPPELIN-1097: --------------------------------------------
Summary: pyspark interpreter doesn't work when spark authentication is enabled Key: ZEPPELIN-1097 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1097 Project: Zeppelin Issue Type: Bug Components: Interpreters Affects Versions: 0.5.6 Environment: aws emr (emr-4.7.1), spark 1.6.1, zeppelin 0.5.6 Reporter: Jonathan Esterhazy pyspark interpreter can't run code on executors when spark authentication is enabled. all pyspark code results in “/usr/bin/python: No module named pyspark” errors on the executors. python/pyspark code works correctly on different cluster with same config minus spark authentication. code to reproduce: {code} %pyspark words = sc.textFile("s3://elasticmapreduce/samples/wordcount/input") filtered = words.filter(lambda w: "CIA" in w).take(5) print filtered {code} more error detail: {noformat} Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 6 in stage 1.0 failed 4 times, most recent failure: Lost task 6.3 in stage 1.0 (TID 30, ip-172-30-52-161.ec2.internal): org.apache.spark.SparkException: Error from python worker: /usr/bin/python: No module named pyspark PYTHONPATH was: /mnt/encrypted/yarn/usercache/zeppelin/filecache/23/spark-assembly-1.6.1-hadoop2.7.2-amzn-2.jar:/usr/lib/spark/python/lib/py4j-0.9-src.zip:/usr/lib/spark/python/lib/pyspark.zip:/usr/lib/spark/python/lib/py4j-0.9-src.zip:/usr/lib/spark/python/lib/pyspark.zip java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:164) at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:87) at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:63) at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:134) at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:101) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ... Caused by: org.apache.spark.SparkException: Error from python worker: /usr/bin/python: No module named pyspark PYTHONPATH was: /mnt/encrypted/yarn/usercache/zeppelin/filecache/23/spark-assembly-1.6.1-hadoop2.7.2-amzn-2.jar:/usr/lib/spark/python/lib/py4j-0.9-src.zip:/usr/lib/spark/python/lib/pyspark.zip:/usr/lib/spark/python/lib/py4j-0.9-src.zip:/usr/lib/spark/python/lib/pyspark.zip java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:164) at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:87) at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:63) at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:134) at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:101) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ... 1 more (<class 'py4j.protocol.Py4JJavaError'>, Py4JJavaError(u'An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.\n', JavaObject id=o139), <traceback object at 0x7fb93824a3b0>) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)