what’s your master? yarn-client or local? Error :
Py4JJavaError: An error occurred while calling o76.showString. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, dn05.prod2.everstring.com): org.apache.spark.SparkException: Error from python worker: /usr/local/bin/python: No module named pyspark PYTHONPATH was: /media/ebs15/hadoop/yarn/local/usercache/hdfs/filecache/1455/spark-assembly-1.5.2-hadoop2.6.0.jar java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:163) at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:86) at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62) at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:135) at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:101) at org.apache.spark.sql.execution.BatchPythonEvaluation$$anonfun$doExecute$1.apply(python.scala:397) at org.apache.spark.sql.execution.BatchPythonEvaluation$$anonfun$doExecute$1.apply(python.scala:362) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:710) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:710) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) > On Dec 8, 2015, at 10:51 PM, moon soo Lee <m...@apache.org> wrote: > > I can run > > %spark > case class Data(name:String) > val data = sc.parallelize(Array(Data("hello"), Data("world"))).toDF > data.registerTempTable("test_table") > > %pyspark > from pyspark.sql.types import BooleanType > sqlContext.udf.register("is_empty",lambda x : True if not x else False, > BooleanType()) > > %pyspark > sqlContext.sql("select is_empty(name) as name from test_table limit > 10").show() > > without error. Can you share what kind of error do you see? > > Thanks, > moon > > On Tue, Dec 8, 2015 at 10:29 PM Fengdong Yu <fengdo...@everstring.com > <mailto:fengdo...@everstring.com>> wrote: > Moon, > > I can run the same code on the pyspark shell. but failed on Zeppelin. > > > > > >> On Dec 8, 2015, at 7:43 PM, moon soo Lee <m...@apache.org >> <mailto:m...@apache.org>> wrote: >> > >> Tried with 0.5.5-incubating release after adding SPARK_1_5_2 in >> spark/src/main/java/org/apache/zeppelin/spark/SparkVersion.java. >> >> My conf/zeppelin-env.sh has only SPARK_HOME that points spark 1.5.2 >> distribution. And i could able to run %pyspark without any problem. >> >> when you run >> >> System.getenv("PYTHONPATH") >> >> in the notebook, what do you see? can you check those files and dirs are >> exists? >> >> Thanks, >> moon >> >> On Tue, Dec 8, 2015 at 6:22 PM Fengdong Yu <fengdo...@everstring.com >> <mailto:fengdo...@everstring.com>> wrote: >> I tried. the same error now. >> >> I even tried remove spark.yarn.jar in interpreter.json, it still the same >> error. >> >> >> >>> On Dec 8, 2015, at 5:07 PM, moon soo Lee <leemoon...@gmail.com >>> <mailto:leemoon...@gmail.com>> wrote: >>> >>> Can you not try to set PYTHONPATH but only SPARK_HOME? >>> >>> Thanks, >>> moon >>> >>> >>> On 2015년 12월 8일 (화) at 오후 6:04 Amjad ALSHABANI <ashshab...@gmail.com >>> <mailto:ashshab...@gmail.com>> wrote: >>> Hello, >>> >>> Are you sure that you ve installed the module pyspark. >>> >>> Please check your spark installation directory if you could see the python >>> sub-directory >>> Amjad >>> >>> On Dec 8, 2015 9:55 AM, "Fengdong Yu" <fengdo...@everstring.com >>> <mailto:fengdo...@everstring.com>> wrote: >>> Hi >>> >>> I am using Zeppelin-0.5.5 with Spark 1.5.2 >>> >>> It cannot find pyspark module. >>> >>> >>> Error from python worker: >>> /usr/local/bin/python: No module named pyspark >>> PYTHONPATH was: >>> >>> >>> >>> I’ve configured pyspark in zeppelin-env.sh: >>> >>> export >>> PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$SPARK_HOME/python/lib/pyspark.zip >>> >>> >>> any others I skipped? Thanks >>> >>> >>> >>