Since you're using YARN, you may also need to set SPARK_YARN_USER_ENV to "PYSPARK_PYTHON=/your/desired/python/on/slave/nodes".
2014-09-04 9:59 GMT-07:00 Davies Liu <dav...@databricks.com>: > Hey Oleg, > > In pyspark, you MUST have the same version of Python in all the > machines of the cluster, > which means when you run `python` on these machines, all of them > should be the same > version ( 2.6 or 2.7). > > With PYSPARK_PYTHON, you can run pyspark with a specified version of > Python. Also, > you should install this version on all the machines and in the same > location. > > Davies > > On Thu, Sep 4, 2014 at 9:25 AM, Oleg Ruchovets <oruchov...@gmail.com> > wrote: > > Hi , > > I am evaluating the PySpark. > > I have hdp hortonworks installed with python 2.6.6. (I can't remove it > since > > it is used by hortonworks). I can successfully execute PySpark on Yarn. > > > > We need to use Anaconda packages , so I install anaconda. Anaconda is > > installed with python 2.7.7 and it is added to classpath. After > installing > > the anaconda Pi example stops to work - I used it for testing PySpark on > > Yarn. > > > > Question: > > How PySpark the can be used with having 2 Python versions on one > machine. > > In classpath I have 2.7.7 on every machine. > > > > How can I check what version is used in runtime executing PySpark 2.7.7? > > > > Exception I get are the same as in previous emails: > > > > [root@HDOP-B spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563]# > > ./bin/spark-submit --master yarn --num-executors 3 --driver-memory 4g > > --executor-memory 2g --executor-cores 1 examples/src/main/python/pi.py > > 1000 > > /usr/jdk64/jdk1.7.0_45/bin/java > > > ::/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/conf:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0- > > 563.jar:/etc/hadoop/conf > > -XX:MaxPermSize=128m -Djava.library.path= -Xms4g -Xmx4g > > 14/09/04 12:53:11 INFO spark.SecurityManager: Changing view acls to: root > > 14/09/04 12:53:11 INFO spark.SecurityManager: SecurityManager: > > authentication disabled; ui acls disabled; users with view permissions: > > Set(root) > > 14/09/04 12:53:12 INFO slf4j.Slf4jLogger: Slf4jLogger started > > 14/09/04 12:53:12 INFO Remoting: Starting remoting > > 14/09/04 12:53:12 INFO Remoting: Remoting started; listening on addresses > > :[akka.tcp://sp...@hdop-b.agt:45747] > > 14/09/04 12:53:12 INFO Remoting: Remoting now listens on addresses: > > [akka.tcp://sp...@hdop-b.agt:45747] > > 14/09/04 12:53:12 INFO spark.SparkEnv: Registering MapOutputTracker > > 14/09/04 12:53:12 INFO spark.SparkEnv: Registering BlockManagerMaster > > 14/09/04 12:53:12 INFO storage.DiskBlockManager: Created local directory > at > > /tmp/spark-local-20140904125312-c7ea > > 14/09/04 12:53:12 INFO storage.MemoryStore: MemoryStore started with > > capacity 2.3 GB. > > 14/09/04 12:53:12 INFO network.ConnectionManager: Bound socket to port > 37363 > > with id = ConnectionManagerId(HDOP-B.AGT,37363) > > 14/09/04 12:53:12 INFO storage.BlockManagerMaster: Trying to register > > BlockManager > > 14/09/04 12:53:12 INFO storage.BlockManagerInfo: Registering block > manager > > HDOP-B.AGT:37363 with 2.3 GB RAM > > 14/09/04 12:53:12 INFO storage.BlockManagerMaster: Registered > BlockManager > > 14/09/04 12:53:12 INFO spark.HttpServer: Starting HTTP Server > > 14/09/04 12:53:12 INFO server.Server: jetty-8.y.z-SNAPSHOT > > 14/09/04 12:53:12 INFO server.AbstractConnector: Started > > SocketConnector@0.0.0.0:33547 > > 14/09/04 12:53:12 INFO broadcast.HttpBroadcast: Broadcast server started > at > > http://10.193.1.76:33547 > > 14/09/04 12:53:12 INFO spark.HttpFileServer: HTTP File server directory > is > > /tmp/spark-054f4eda-b93b-47d3-87d5-c40e81fc1fe8 > > 14/09/04 12:53:12 INFO spark.HttpServer: Starting HTTP Server > > 14/09/04 12:53:12 INFO server.Server: jetty-8.y.z-SNAPSHOT > > 14/09/04 12:53:12 INFO server.AbstractConnector: Started > > SocketConnector@0.0.0.0:54594 > > 14/09/04 12:53:13 INFO server.Server: jetty-8.y.z-SNAPSHOT > > 14/09/04 12:53:13 INFO server.AbstractConnector: Started > > SelectChannelConnector@0.0.0.0:4040 > > 14/09/04 12:53:13 INFO ui.SparkUI: Started SparkUI at > http://HDOP-B.AGT:4040 > > 14/09/04 12:53:13 WARN util.NativeCodeLoader: Unable to load > native-hadoop > > library for your platform... using builtin-java classes where applicable > > --args is deprecated. Use --arg instead. > > 14/09/04 12:53:14 INFO client.RMProxy: Connecting to ResourceManager at > > HDOP-N1.AGT/10.193.1.72:8050 > > 14/09/04 12:53:14 INFO yarn.Client: Got Cluster metric info from > > ApplicationsManager (ASM), number of NodeManagers: 6 > > 14/09/04 12:53:14 INFO yarn.Client: Queue info ... queueName: default, > > queueCurrentCapacity: 0.0, queueMaxCapacity: 1.0, > > queueApplicationCount = 0, queueChildQueueCount = 0 > > 14/09/04 12:53:14 INFO yarn.Client: Max mem capabililty of a single > resource > > in this cluster 13824 > > 14/09/04 12:53:14 INFO yarn.Client: Preparing Local resources > > 14/09/04 12:53:15 INFO yarn.Client: Uploading > > > file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar > > to > > > hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409805761292_0005/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar > > 14/09/04 12:53:17 INFO yarn.Client: Uploading > > > file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py > > to > > > hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409805761292_0005/pi.py > > 14/09/04 12:53:17 INFO yarn.Client: Setting up the launch environment > > 14/09/04 12:53:17 INFO yarn.Client: Setting up container launch context > > 14/09/04 12:53:17 INFO yarn.Client: Command for starting the Spark > > ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx4096m, > > -Djava.io.tmpdir=$PWD/tmp, > > > -Dspark.tachyonStore.folderName=\"spark-2b59c845-3de2-4c3d-a352-1379ecade281\", > > -Dspark.executor.memory=\"2g\", -Dspark.executor.instances=\"3\", > > > -Dspark.yarn.dist.files=\"file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py\", > > -Dspark.yarn.secondary.jars=\"\", -Dspark.submit.pyFiles=\"\", > > -Dspark.driver.host=\"HDOP-B.AGT\", -Dspark.app.name=\"PythonPi\", > > -Dspark.fileserver.uri=\"http://10.193.1.76:54594\", > > -Dspark.master=\"yarn-client\", -Dspark.driver.port=\"45747\", > > -Dspark.executor.cores=\"1\", > > -Dspark.httpBroadcast.uri=\"http://10.193.1.76:33547\", > > -Dlog4j.configuration=log4j-spark-container.properties, > > org.apache.spark.deploy.yarn.ExecutorLauncher, --class, notused, --jar , > > null, --args 'HDOP-B.AGT:45747' , --executor-memory, 2048, > > --executor-cores, 1, --num-executors , 3, 1>, <LOG_DIR>/stdout, 2>, > > <LOG_DIR>/stderr) > > 14/09/04 12:53:17 INFO yarn.Client: Submitting application to ASM > > 14/09/04 12:53:17 INFO impl.YarnClientImpl: Submitted application > > application_1409805761292_0005 > > 14/09/04 12:53:17 INFO cluster.YarnClientSchedulerBackend: Application > > report from ASM: > > appMasterRpcPort: -1 > > appStartTime: 1409806397305 > > yarnAppState: ACCEPTED > > > > 14/09/04 12:53:18 INFO cluster.YarnClientSchedulerBackend: Application > > report from ASM: > > appMasterRpcPort: -1 > > appStartTime: 1409806397305 > > yarnAppState: ACCEPTED > > > > 14/09/04 12:53:19 INFO cluster.YarnClientSchedulerBackend: Application > > report from ASM: > > appMasterRpcPort: -1 > > appStartTime: 1409806397305 > > yarnAppState: ACCEPTED > > > > 14/09/04 12:53:20 INFO cluster.YarnClientSchedulerBackend: Application > > report from ASM: > > appMasterRpcPort: -1 > > appStartTime: 1409806397305 > > yarnAppState: ACCEPTED > > > > 14/09/04 12:53:21 INFO cluster.YarnClientSchedulerBackend: Application > > report from ASM: > > appMasterRpcPort: 0 > > appStartTime: 1409806397305 > > yarnAppState: RUNNING > > > > 14/09/04 12:53:23 INFO cluster.YarnClientClusterScheduler: > > YarnClientClusterScheduler.postStartHook done > > 14/09/04 12:53:26 INFO cluster.YarnClientSchedulerBackend: Registered > > executor: > > Actor[akka.tcp://sparkexecu...@hdop-n1.agt:40024/user/Executor# > 2065794895] > > with ID 1 > > 14/09/04 12:53:26 INFO storage.BlockManagerInfo: Registering block > manager > > HDOP-N1.AGT:34857 with 1178.1 MB RAM > > 14/09/04 12:53:26 INFO cluster.YarnClientSchedulerBackend: Registered > > executor: > > Actor[akka.tcp://sparkexecu...@hdop-n4.agt > :49234/user/Executor#820272849] > > with ID 3 > > 14/09/04 12:53:27 INFO cluster.YarnClientSchedulerBackend: Registered > > executor: > > Actor[akka.tcp://sparkexecu...@hdop-m.agt:38124/user/Executor#715249825] > > with ID 2 > > 14/09/04 12:53:27 INFO storage.BlockManagerInfo: Registering block > manager > > HDOP-N4.AGT:43365 with 1178.1 MB RAM > > 14/09/04 12:53:27 INFO storage.BlockManagerInfo: Registering block > manager > > HDOP-M.AGT:45711 with 1178.1 MB RAM > > 14/09/04 12:53:55 INFO spark.SparkContext: Starting job: reduce at > > > /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38 > > 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Got job 0 (reduce at > > > /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38) > > with 1000 output partitions (allowLocal=false) > > 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Final stage: Stage > 0(reduce > > at > > > /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38) > > 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Parents of final stage: > > List() > > 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Missing parents: List() > > 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Submitting Stage 0 > > (PythonRDD[1] at RDD at PythonRDD.scala:37), which has no missing parents > > 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Submitting 1000 missing > tasks > > from Stage 0 (PythonRDD[1] at RDD at PythonRDD.scala:37) > > 14/09/04 12:53:55 INFO cluster.YarnClientClusterScheduler: Adding task > set > > 0.0 with 1000 tasks > > 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Starting task 0.0:0 as > TID > > 0 on executor 3: HDOP-N4.AGT (PROCESS_LOCAL) > > 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as > > 369810 bytes in 5 ms > > 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Starting task 0.0:1 as > TID > > 1 on executor 2: HDOP-M.AGT (PROCESS_LOCAL) > > 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as > > 506275 bytes in 2 ms > > 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Starting task 0.0:2 as > TID > > 2 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL) > > 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Serialized task 0.0:2 as > > 501135 bytes in 2 ms > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:3 as > TID > > 3 on executor 2: HDOP-M.AGT (PROCESS_LOCAL) > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:3 as > > 506275 bytes in 5 ms > > 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 1 (task 0.0:1) > > 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Loss was due to > > org.apache.spark.api.python.PythonException > > org.apache.spark.api.python.PythonException: Traceback (most recent call > > last): > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py", > > line 77, in main > > serializer.dump_stream(func(split_index, iterator), outfile) > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 191, in dump_stream > > self.serializer.dump_stream(self._batched(iterator), stream) > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 123, in dump_stream > > for obj in iterator: > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 180, in _batched > > for item in iterator: > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py", > > line 612, in func > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", > > line 36, in f > > SystemError: unknown opcode > > > > at > org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115) > > at > > org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145) > > at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78) > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) > > at org.apache.spark.scheduler.Task.run(Task.scala:51) > > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:744) > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:1 as > TID > > 4 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL) > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as > > 506275 bytes in 5 ms > > 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 2 (task 0.0:2) > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to > > org.apache.spark.api.python.PythonException: Traceback (most recent call > > last): > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py", > > line 77, in main > > serializer.dump_stream(func(split_index, iterator), outfile) > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 191, in dump_stream > > self.serializer.dump_stream(self._batched(iterator), stream) > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 123, in dump_stream > > for obj in iterator: > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 180, in _batched > > for item in iterator: > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py", > > line 612, in func > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", > > line 36, in f > > SystemError: unknown opcode > > [duplicate 1] > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:2 as > TID > > 5 on executor 2: HDOP-M.AGT (PROCESS_LOCAL) > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:2 as > > 501135 bytes in 5 ms > > 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 3 (task 0.0:3) > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to > > org.apache.spark.api.python.PythonException: Traceback (most recent call > > last): > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py", > > line 77, in main > > serializer.dump_stream(func(split_index, iterator), outfile) > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 191, in dump_stream > > self.serializer.dump_stream(self._batched(iterator), stream) > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 123, in dump_stream > > for obj in iterator: > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 180, in _batched > > for item in iterator: > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py", > > line 612, in func > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", > > line 36, in f > > SystemError: unknown opcode > > [duplicate 2] > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:3 as > TID > > 6 on executor 3: HDOP-N4.AGT (PROCESS_LOCAL) > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:3 as > > 506275 bytes in 5 ms > > 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 0 (task 0.0:0) > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to > > org.apache.spark.api.python.PythonException: Traceback (most recent call > > last): > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py", > > line 77, in main > > serializer.dump_stream(func(split_index, iterator), outfile) > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 191, in dump_stream > > self.serializer.dump_stream(self._batched(iterator), stream) > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 123, in dump_stream > > for obj in iterator: > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 180, in _batched > > for item in iterator: > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py", > > line 612, in func > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", > > line 36, in f > > SystemError: unknown opcode > > [duplicate 3] > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:0 as > TID > > 7 on executor 2: HDOP-M.AGT (PROCESS_LOCAL) > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as > > 369810 bytes in 4 ms > > 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 5 (task 0.0:2) > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to > > org.apache.spark.api.python.PythonException: Traceback (most recent call > > last): > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py", > > line 77, in main > > serializer.dump_stream(func(split_index, iterator), outfile) > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 191, in dump_stream > > self.serializer.dump_stream(self._batched(iterator), stream) > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 123, in dump_stream > > for obj in iterator: > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 180, in _batched > > for item in iterator: > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py", > > line 612, in func > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", > > line 36, in f > > SystemError: unknown opcode > > [duplicate 4] > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:2 as > TID > > 8 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL) > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:2 as > > 501135 bytes in 3 ms > > 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 4 (task 0.0:1) > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to > > org.apache.spark.api.python.PythonException: Traceback (most recent call > > last): > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py", > > line 77, in main > > serializer.dump_stream(func(split_index, iterator), outfile) > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 191, in dump_stream > > self.serializer.dump_stream(self._batched(iterator), stream) > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 123, in dump_stream > > for obj in iterator: > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 180, in _batched > > for item in iterator: > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py", > > line 612, in func > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", > > line 36, in f > > SystemError: unknown opcode > > [duplicate 5] > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:1 as > TID > > 9 on executor 3: HDOP-N4.AGT (PROCESS_LOCAL) > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as > > 506275 bytes in 4 ms > > 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 6 (task 0.0:3) > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to > > org.apache.spark.api.python.PythonException: Traceback (most recent call > > last): > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py", > > line 77, in main > > serializer.dump_stream(func(split_index, iterator), outfile) > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 191, in dump_stream > > self.serializer.dump_stream(self._batched(iterator), stream) > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 123, in dump_stream > > for obj in iterator: > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 180, in _batched > > for item in iterator: > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py", > > line 612, in func > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", > > line 36, in f > > SystemError: unknown opcode > > [duplicate 6] > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:3 as > TID > > 10 on executor 2: HDOP-M.AGT (PROCESS_LOCAL) > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:3 as > > 506275 bytes in 3 ms > > 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 7 (task 0.0:0) > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to > > org.apache.spark.api.python.PythonException: Traceback (most recent call > > last): > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py", > > line 77, in main > > serializer.dump_stream(func(split_index, iterator), outfile) > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 191, in dump_stream > > self.serializer.dump_stream(self._batched(iterator), stream) > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 123, in dump_stream > > for obj in iterator: > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 180, in _batched > > for item in iterator: > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py", > > line 612, in func > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", > > line 36, in f > > SystemError: unknown opcode > > [duplicate 7] > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:0 as > TID > > 11 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL) > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as > > 369810 bytes in 3 ms > > 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 8 (task 0.0:2) > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to > > org.apache.spark.api.python.PythonException: Traceback (most recent call > > last): > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py", > > line 77, in main > > serializer.dump_stream(func(split_index, iterator), outfile) > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 191, in dump_stream > > self.serializer.dump_stream(self._batched(iterator), stream) > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 123, in dump_stream > > for obj in iterator: > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 180, in _batched > > for item in iterator: > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py", > > line 612, in func > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", > > line 36, in f > > SystemError: unknown opcode > > [duplicate 8] > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:2 as > TID > > 12 on executor 2: HDOP-M.AGT (PROCESS_LOCAL) > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:2 as > > 501135 bytes in 4 ms > > 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 10 (task 0.0:3) > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to > > org.apache.spark.api.python.PythonException: Traceback (most recent call > > last): > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py", > > line 77, in main > > serializer.dump_stream(func(split_index, iterator), outfile) > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 191, in dump_stream > > self.serializer.dump_stream(self._batched(iterator), stream) > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 123, in dump_stream > > for obj in iterator: > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 180, in _batched > > for item in iterator: > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py", > > line 612, in func > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", > > line 36, in f > > SystemError: unknown opcode > > [duplicate 9] > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:3 as > TID > > 13 on executor 3: HDOP-N4.AGT (PROCESS_LOCAL) > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:3 as > > 506275 bytes in 3 ms > > 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 9 (task 0.0:1) > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to > > org.apache.spark.api.python.PythonException: Traceback (most recent call > > last): > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py", > > line 77, in main > > serializer.dump_stream(func(split_index, iterator), outfile) > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 191, in dump_stream > > self.serializer.dump_stream(self._batched(iterator), stream) > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 123, in dump_stream > > for obj in iterator: > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 180, in _batched > > for item in iterator: > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py", > > line 612, in func > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", > > line 36, in f > > SystemError: unknown opcode > > [duplicate 10] > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:1 as > TID > > 14 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL) > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as > > 506275 bytes in 4 ms > > 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 11 (task 0.0:0) > > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to > > org.apache.spark.api.python.PythonException: Traceback (most recent call > > last): > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py", > > line 77, in main > > serializer.dump_stream(func(split_index, iterator), outfile) > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 191, in dump_stream > > self.serializer.dump_stream(self._batched(iterator), stream) > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 123, in dump_stream > > for obj in iterator: > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 180, in _batched > > for item in iterator: > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py", > > line 612, in func > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", > > line 36, in f > > SystemError: unknown opcode > > [duplicate 11] > > 14/09/04 12:53:57 INFO scheduler.TaskSetManager: Starting task 0.0:0 as > TID > > 15 on executor 2: HDOP-M.AGT (PROCESS_LOCAL) > > 14/09/04 12:53:57 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as > > 369810 bytes in 4 ms > > 14/09/04 12:53:57 WARN scheduler.TaskSetManager: Lost TID 12 (task 0.0:2) > > 14/09/04 12:53:57 INFO scheduler.TaskSetManager: Loss was due to > > org.apache.spark.api.python.PythonException: Traceback (most recent call > > last): > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py", > > line 77, in main > > serializer.dump_stream(func(split_index, iterator), outfile) > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 191, in dump_stream > > self.serializer.dump_stream(self._batched(iterator), stream) > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 123, in dump_stream > > for obj in iterator: > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 180, in _batched > > for item in iterator: > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py", > > line 612, in func > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", > > line 36, in f > > SystemError: unknown opcode > > [duplicate 12] > > 14/09/04 12:53:57 ERROR scheduler.TaskSetManager: Task 0.0:2 failed 4 > times; > > aborting job > > 14/09/04 12:53:57 INFO scheduler.TaskSetManager: Loss was due to > > org.apache.spark.api.python.PythonException: Traceback (most recent call > > last): > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py", > > line 77, in main > > serializer.dump_stream(func(split_index, iterator), outfile) > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 191, in dump_stream > > self.serializer.dump_stream(self._batched(iterator), stream) > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 123, in dump_stream > > for obj in iterator: > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 180, in _batched > > for item in iterator: > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py", > > line 612, in func > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", > > line 36, in f > > SystemError: unknown opcode > > [duplicate 13] > > 14/09/04 12:53:57 INFO cluster.YarnClientClusterScheduler: Cancelling > stage > > 0 > > 14/09/04 12:53:57 INFO cluster.YarnClientClusterScheduler: Stage 0 was > > cancelled > > 14/09/04 12:53:57 INFO scheduler.DAGScheduler: Failed to run reduce at > > > /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38 > > Traceback (most recent call last): > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", > > line 38, in <module> > > count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add) > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py", > > line 619, in reduce > > vals = self.mapPartitions(func).collect() > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py", > > line 583, in collect > > bytesInJava = self._jrdd.collect().iterator() > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", > > line 537, in __call__ > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py", > > line 300, in get_return_value > > py4j.protocol.Py4JJavaError14/09/04 12:53:57 INFO > scheduler.TaskSetManager: > > Loss was due to org.apache.spark.api.python.PythonException: Traceback > (most > > recent call last): > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py", > > line 77, in main > > serializer.dump_stream(func(split_index, iterator), outfile) > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 191, in dump_stream > > self.serializer.dump_stream(self._batched(iterator), stream) > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 123, in dump_stream > > for obj in iterator: > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 180, in _batched > > for item in iterator: > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py", > > line 612, in func > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", > > line 36, in f > > SystemError: unknown opcode > > [duplicate 14] > > 14/09/04 12:53:57 WARN scheduler.TaskSetManager: Loss was due to > > org.apache.spark.TaskKilledException > > org.apache.spark.TaskKilledException > > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174) > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:744) > > : An error occurred while calling o24.collect. > > : org.apache.spark.SparkException: Job aborted due to stage failure: Task > > 0.0:2 failed 4 times, most recent failure: Exception failure in TID 12 on > > host HDOP-M.AGT: org.apache.spark.api.python.PythonException: Traceback > > (most recent call last): > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py", > > line 77, in main > > serializer.dump_stream(func(split_index, iterator), outfile) > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 191, in dump_stream > > self.serializer.dump_stream(self._batched(iterator), stream) > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 123, in dump_stream > > for obj in iterator: > > File > > > "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", > > line 180, in _batched > > for item in iterator: > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py", > > line 612, in func > > File > > > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", > > line 36, in f > > SystemError: unknown opcode > > > > > > org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115) > > > > org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145) > > org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78) > > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > > org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > > > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) > > org.apache.spark.scheduler.Task.run(Task.scala:51) > > > > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > java.lang.Thread.run(Thread.java:744) > > Driver stacktrace: > > at > > org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044) > > at > > > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028) > > at > > > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026) > > at > > > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > > at > > > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026) > > at > > > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634) > > at > > > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634) > > at scala.Option.foreach(Option.scala:236) > > at > > > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634) > > at > > > org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229) > > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) > > at akka.actor.ActorCell.invoke(ActorCell.scala:456) > > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) > > at akka.dispatch.Mailbox.run(Mailbox.scala:219) > > at > > > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) > > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > > at > > > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > > at > > > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > > > 14/09/04 12:53:57 INFO cluster.YarnClientClusterScheduler: Removed > TaskSet > > 0.0, whose tasks have all completed, from pool > > > > > > thanks > > Oleg. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >