[jira] [Created] (ZEPPELIN-1097) pyspark interpreter doesn't work when spark authentication is enabled

Jonathan Esterhazy (JIRA) Fri, 01 Jul 2016 11:00:24 -0700

Jonathan Esterhazy created ZEPPELIN-1097:
--------------------------------------------


             Summary: pyspark interpreter doesn't work when spark 
authentication is enabled
                 Key: ZEPPELIN-1097
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1097
             Project: Zeppelin
          Issue Type: Bug
          Components: Interpreters
    Affects Versions: 0.5.6
         Environment: aws emr (emr-4.7.1), spark 1.6.1, zeppelin 0.5.6
            Reporter: Jonathan Esterhazy


pyspark interpreter can't run code on executors when spark authentication is 
enabled. all pyspark code results in “/usr/bin/python: No module named pyspark” 
errors on the executors.

python/pyspark code works correctly on different cluster with same config minus 
spark authentication.

code to reproduce:

{code}
%pyspark
words = sc.textFile("s3://elasticmapreduce/samples/wordcount/input")
filtered = words.filter(lambda w: "CIA" in w).take(5)
print filtered
{code}
 
more error detail:

{noformat}
Py4JJavaError: An error occurred while calling 
z:org.apache.spark.api.python.PythonRDD.collectAndServe. : 
org.apache.spark.SparkException: Job aborted due to stage failure: Task 6 in 
stage 1.0 failed 4 times, most recent failure: Lost task 6.3 in stage 1.0 (TID 
30, ip-172-30-52-161.ec2.internal): org.apache.spark.SparkException: Error from 
python worker: /usr/bin/python: No module named pyspark PYTHONPATH was: 
/mnt/encrypted/yarn/usercache/zeppelin/filecache/23/spark-assembly-1.6.1-hadoop2.7.2-amzn-2.jar:/usr/lib/spark/python/lib/py4j-0.9-src.zip:/usr/lib/spark/python/lib/pyspark.zip:/usr/lib/spark/python/lib/py4j-0.9-src.zip:/usr/lib/spark/python/lib/pyspark.zip
 java.io.EOFException at 
java.io.DataInputStream.readInt(DataInputStream.java:392) at 
org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:164)
 at 
org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:87)
 at 
org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:63)
 at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:134) at 
org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:101) at 
org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70) at 
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at 
org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at 
org.apache.spark.scheduler.Task.run(Task.scala:89) at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
at java.lang.Thread.run(Thread.java:745)
 
...
 
Caused by: org.apache.spark.SparkException: Error from python worker: 
/usr/bin/python: No module named pyspark PYTHONPATH was: 
/mnt/encrypted/yarn/usercache/zeppelin/filecache/23/spark-assembly-1.6.1-hadoop2.7.2-amzn-2.jar:/usr/lib/spark/python/lib/py4j-0.9-src.zip:/usr/lib/spark/python/lib/pyspark.zip:/usr/lib/spark/python/lib/py4j-0.9-src.zip:/usr/lib/spark/python/lib/pyspark.zip
 java.io.EOFException at 
java.io.DataInputStream.readInt(DataInputStream.java:392) at 
org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:164)
 at 
org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:87)
 at 
org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:63)
 at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:134) at 
org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:101) at 
org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70) at 
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at 
org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at 
org.apache.spark.scheduler.Task.run(Task.scala:89) at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
... 1 more (<class 'py4j.protocol.Py4JJavaError'>, Py4JJavaError(u'An error 
occurred while calling 
z:org.apache.spark.api.python.PythonRDD.collectAndServe.\n', JavaObject 
id=o139), <traceback object at 0x7fb93824a3b0>)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (ZEPPELIN-1097) pyspark interpreter doesn't work when spark authentication is enabled

Reply via email to