Hi, I changed to master to point on yarn and got such exceptions: [root@HDOP-B spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563]# ./bin/spark-submit --master yarn://HDOP-M.AGT:8032 --num-executors 3 --driver-memory 4g --executor-memory 2g --executor-cores 1 examples/src/main/python/pi.py 1000 /usr/jdk64/jdk1.7.0_45/bin/java ::/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/conf:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar:/etc/hadoop/conf -XX:MaxPermSize=128m -Djava.library.path= -Xms4g -Xmx4g 14/09/03 13:33:29 INFO spark.SecurityManager: Changing view acls to: root 14/09/03 13:33:29 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root) 14/09/03 13:33:30 INFO slf4j.Slf4jLogger: Slf4jLogger started 14/09/03 13:33:30 INFO Remoting: Starting remoting 14/09/03 13:33:30 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sp...@hdop-b.agt:49765] 14/09/03 13:33:30 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sp...@hdop-b.agt:49765] 14/09/03 13:33:31 INFO spark.SparkEnv: Registering MapOutputTracker 14/09/03 13:33:31 INFO spark.SparkEnv: Registering BlockManagerMaster 14/09/03 13:33:31 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20140903133331-205a 14/09/03 13:33:31 INFO storage.MemoryStore: MemoryStore started with capacity 2.3 GB. 14/09/03 13:33:31 INFO network.ConnectionManager: Bound socket to port 54486 with id = ConnectionManagerId(HDOP-B.AGT,54486) 14/09/03 13:33:31 INFO storage.BlockManagerMaster: Trying to register BlockManager 14/09/03 13:33:31 INFO storage.BlockManagerInfo: Registering block manager HDOP-B.AGT:54486 with 2.3 GB RAM 14/09/03 13:33:31 INFO storage.BlockManagerMaster: Registered BlockManager 14/09/03 13:33:31 INFO spark.HttpServer: Starting HTTP Server 14/09/03 13:33:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/09/03 13:33:31 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:60199 14/09/03 13:33:31 INFO broadcast.HttpBroadcast: Broadcast server started at http://10.193.1.76:60199 14/09/03 13:33:31 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-67574874-3b14-4c8d-b075-580061d140e0 14/09/03 13:33:31 INFO spark.HttpServer: Starting HTTP Server 14/09/03 13:33:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/09/03 13:33:31 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:45848 14/09/03 13:33:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/09/03 13:33:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 14/09/03 13:33:31 INFO ui.SparkUI: Started SparkUI at http://HDOP-B.AGT:4040 14/09/03 13:33:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable --args is deprecated. Use --arg instead. 14/09/03 13:33:32 INFO client.RMProxy: Connecting to ResourceManager at HDOP-N1.AGT/10.193.1.72:8050 14/09/03 13:33:33 INFO yarn.Client: Got Cluster metric info from ApplicationsManager (ASM), number of NodeManagers: 6 14/09/03 13:33:33 INFO yarn.Client: Queue info ... queueName: default, queueCurrentCapacity: 0.0, queueMaxCapacity: 1.0, queueApplicationCount = 0, queueChildQueueCount = 0 14/09/03 13:33:33 INFO yarn.Client: Max mem capabililty of a single resource in this cluster 13824 14/09/03 13:33:33 INFO yarn.Client: Preparing Local resources 14/09/03 13:33:33 INFO yarn.Client: Uploading file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar to hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409559972905_0032/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar 14/09/03 13:33:35 INFO yarn.Client: Uploading file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py to hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409559972905_0032/pi.py 14/09/03 13:33:35 INFO yarn.Client: Setting up the launch environment 14/09/03 13:33:35 INFO yarn.Client: Setting up container launch context 14/09/03 13:33:35 INFO yarn.Client: Command for starting the Spark ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx4096m, -Djava.io.tmpdir=$PWD/tmp, -Dspark.tachyonStore.folderName=\"spark-52943421-004b-4ae7-990f-d8591a830ef8\", -Dspark.executor.memory=\"2g\", -Dspark.executor.instances=\"3\", -Dspark.yarn.dist.files=\"file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py\", -Dspark.yarn.secondary.jars=\"\", -Dspark.submit.pyFiles=\"\", -Dspark.driver.host=\"HDOP-B.AGT\", -Dspark.app.name=\"PythonPi\", -Dspark.fileserver.uri=\"http://10.193.1.76:45848\", -Dspark.master=\"yarn-client\", -Dspark.driver.port=\"49765\", -Dspark.executor.cores=\"1\", -Dspark.httpBroadcast.uri=\" http://10.193.1.76:60199\", -Dlog4j.configuration=log4j-spark-container.properties, org.apache.spark.deploy.yarn.ExecutorLauncher, --class, notused, --jar , null, --args 'HDOP-B.AGT:49765' , --executor-memory, 2048, --executor-cores, 1, --num-executors , 3, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr) 14/09/03 13:33:35 INFO yarn.Client: Submitting application to ASM 14/09/03 13:33:35 INFO impl.YarnClientImpl: Submitted application application_1409559972905_0032 14/09/03 13:33:35 INFO cluster.YarnClientSchedulerBackend: Application report from ASM: appMasterRpcPort: -1 appStartTime: 1409722415745 yarnAppState: ACCEPTED
14/09/03 13:33:36 INFO cluster.YarnClientSchedulerBackend: Application report from ASM: appMasterRpcPort: -1 appStartTime: 1409722415745 yarnAppState: ACCEPTED 14/09/03 13:33:37 INFO cluster.YarnClientSchedulerBackend: Application report from ASM: appMasterRpcPort: -1 appStartTime: 1409722415745 yarnAppState: ACCEPTED 14/09/03 13:33:38 INFO cluster.YarnClientSchedulerBackend: Application report from ASM: appMasterRpcPort: -1 appStartTime: 1409722415745 yarnAppState: ACCEPTED 14/09/03 13:33:39 INFO cluster.YarnClientSchedulerBackend: Application report from ASM: appMasterRpcPort: -1 appStartTime: 1409722415745 yarnAppState: ACCEPTED 14/09/03 13:33:40 INFO cluster.YarnClientSchedulerBackend: Application report from ASM: appMasterRpcPort: -1 appStartTime: 1409722415745 yarnAppState: ACCEPTED 14/09/03 13:33:41 INFO cluster.YarnClientSchedulerBackend: Application report from ASM: appMasterRpcPort: 0 appStartTime: 1409722415745 yarnAppState: RUNNING 14/09/03 13:33:43 INFO cluster.YarnClientClusterScheduler: YarnClientClusterScheduler.postStartHook done 14/09/03 13:33:45 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkexecu...@hdop-b.agt:45614/user/Executor#1371263571] with ID 2 14/09/03 13:33:45 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkexecu...@hdop-n1.agt:34106/user/Executor#1776898824] with ID 1 14/09/03 13:33:45 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkexecu...@hdop-m.agt:34327/user/Executor#8093279] with ID 3 14/09/03 13:33:45 INFO storage.BlockManagerInfo: Registering block manager HDOP-B.AGT:43556 with 1178.1 MB RAM 14/09/03 13:33:45 INFO storage.BlockManagerInfo: Registering block manager HDOP-N1.AGT:35149 with 1178.1 MB RAM 14/09/03 13:33:46 INFO storage.BlockManagerInfo: Registering block manager HDOP-M.AGT:53383 with 1178.1 MB RAM 14/09/03 13:34:15 INFO spark.SparkContext: Starting job: reduce at /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38 14/09/03 13:34:15 INFO scheduler.DAGScheduler: Got job 0 (reduce at /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38) with 1000 output partitions (allowLocal=false) 14/09/03 13:34:15 INFO scheduler.DAGScheduler: Final stage: Stage 0(reduce at /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38) 14/09/03 13:34:15 INFO scheduler.DAGScheduler: Parents of final stage: List() 14/09/03 13:34:15 INFO scheduler.DAGScheduler: Missing parents: List() 14/09/03 13:34:15 INFO scheduler.DAGScheduler: Submitting Stage 0 (PythonRDD[1] at RDD at PythonRDD.scala:37), which has no missing parents 14/09/03 13:34:15 INFO scheduler.DAGScheduler: Submitting 1000 missing tasks from Stage 0 (PythonRDD[1] at RDD at PythonRDD.scala:37) 14/09/03 13:34:15 INFO cluster.YarnClientClusterScheduler: Adding task set 0.0 with 1000 tasks 14/09/03 13:34:15 INFO scheduler.TaskSetManager: Starting task 0.0:0 as TID 0 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL) 14/09/03 13:34:15 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as 369810 bytes in 8 ms 14/09/03 13:34:15 INFO scheduler.TaskSetManager: Starting task 0.0:1 as TID 1 on executor 3: HDOP-M.AGT (PROCESS_LOCAL) 14/09/03 13:34:15 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as 506275 bytes in 5 ms 14/09/03 13:34:15 INFO scheduler.TaskSetManager: Starting task 0.0:2 as TID 2 on executor 2: HDOP-B.AGT (PROCESS_LOCAL) 14/09/03 13:34:15 INFO scheduler.TaskSetManager: Serialized task 0.0:2 as 501135 bytes in 4 ms 14/09/03 13:34:16 INFO scheduler.TaskSetManager: Starting task 0.0:3 as TID 3 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL) 14/09/03 13:34:16 INFO scheduler.TaskSetManager: Serialized task 0.0:3 as 506275 bytes in 5 ms 14/09/03 13:34:16 WARN scheduler.TaskSetManager: Lost TID 0 (task 0.0:0) 14/09/03 13:34:16 WARN scheduler.TaskSetManager: Loss was due to org.apache.spark.api.python.PythonException org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/tmp/hadoop/yarn/local/usercache/root/filecache/21/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py", line 77, in main serializer.dump_stream(func(split_index, iterator), outfile) File "/tmp/hadoop/yarn/local/usercache/root/filecache/21/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 191, in dump_stream self.serializer.dump_stream(self._batched(iterator), stream) File "/tmp/hadoop/yarn/local/usercache/root/filecache/21/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 123, in dump_stream for obj in iterator: File "/tmp/hadoop/yarn/local/usercache/root/filecache/21/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 180, in _batched for item in iterator: File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py", line 612, in func File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", line 36, in f SystemError: unknown opcode at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115) at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) 14/09/03 13:34:16 INFO scheduler.TaskSetManager: Starting task 0.0:0 as TID 4 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL) 14/09/03 13:34:16 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as 369810 bytes in 3 ms 14/09/03 13:34:16 WARN scheduler.TaskSetManager: Lost TID 3 (task 0.0:3) 14/09/03 13:34:16 INFO scheduler.TaskSetManager: Loss was due to org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/tmp/hadoop/yarn/local/usercache/root/filecache/21/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py", line 77, in main serializer.dump_stream(func(split_index, iterator), outfile) File "/tmp/hadoop/yarn/local/usercache/root/filecache/21/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 191, in dump_stream self.serializer.dump_stream(self._batched(iterator), stream) File "/tmp/hadoop/yarn/local/usercache/root/filecache/21/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 123, in dump_stream for obj in iterator: File "/tmp/hadoop/yarn/local/usercache/root/filecache/21/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 180, in _batched for item in iterator: File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py", line 612, in func File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", line 36, in f SystemError: unknown opcode [duplicate 1] 14/09/03 13:34:16 INFO scheduler.TaskSetManager: Starting task 0.0:3 as TID 5 on executor 3: HDOP-M.AGT (PROCESS_LOCAL) 14/09/03 13:34:16 INFO scheduler.TaskSetManager: Serialized task 0.0:3 as 506275 bytes in 5 ms 14/09/03 13:34:16 WARN scheduler.TaskSetManager: Lost TID 1 (task 0.0:1) 14/09/03 13:34:16 WARN scheduler.TaskSetManager: Loss was due to org.apache.spark.api.python.PythonException org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py", line 77, in main serializer.dump_stream(func(split_index, iterator), outfile) File "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 191, in dump_stream self.serializer.dump_stream(self._batched(iterator), stream) File "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 123, in dump_stream for obj in iterator: File "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 180, in _batched for item in iterator: File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py", line 612, in func File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", line 36, in f SystemError: unknown opcode at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115) at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) 14/09/03 13:34:16 INFO scheduler.TaskSetManager: Starting task 0.0:1 as TID 6 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL) 14/09/03 13:34:16 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as 506275 bytes in 5 ms 14/09/03 13:34:16 WARN scheduler.TaskSetManager: Lost TID 4 (task 0.0:0) 14/09/03 13:34:16 INFO scheduler.TaskSetManager: Loss was due to org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/tmp/hadoop/yarn/local/usercache/root/filecache/21/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py", line 77, in main serializer.dump_stream(func(split_index, iterator), outfile) File "/tmp/hadoop/yarn/local/usercache/root/filecache/21/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 191, in dump_stream self.serializer.dump_stream(self._batched(iterator), stream) File "/tmp/hadoop/yarn/local/usercache/root/filecache/21/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 123, in dump_stream for obj in iterator: File "/tmp/hadoop/yarn/local/usercache/root/filecache/21/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 180, in _batched for item in iterator: File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py", line 612, in func File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", line 36, in f SystemError: unknown opcode [duplicate 2] 14/09/03 13:34:16 INFO scheduler.TaskSetManager: Starting task 0.0:0 as TID 7 on executor 3: HDOP-M.AGT (PROCESS_LOCAL) 14/09/03 13:34:16 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as 369810 bytes in 3 ms 14/09/03 13:34:16 WARN scheduler.TaskSetManager: Lost TID 5 (task 0.0:3) 14/09/03 13:34:16 INFO scheduler.TaskSetManager: Loss was due to org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py", line 77, in main serializer.dump_stream(func(split_index, iterator), outfile) File "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 191, in dump_stream self.serializer.dump_stream(self._batched(iterator), stream) File "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 123, in dump_stream for obj in iterator: File "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 180, in _batched for item in iterator: File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py", line 612, in func File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", line 36, in f SystemError: unknown opcode [duplicate 1] 14/09/03 13:34:16 INFO scheduler.TaskSetManager: Starting task 0.0:3 as TID 8 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL) 14/09/03 13:34:16 INFO scheduler.TaskSetManager: Serialized task 0.0:3 as 506275 bytes in 3 ms 14/09/03 13:34:16 WARN scheduler.TaskSetManager: Lost TID 6 (task 0.0:1) 14/09/03 13:34:16 INFO scheduler.TaskSetManager: Loss was due to org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/tmp/hadoop/yarn/local/usercache/root/filecache/21/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py", line 77, in main serializer.dump_stream(func(split_index, iterator), outfile) File "/tmp/hadoop/yarn/local/usercache/root/filecache/21/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 191, in dump_stream self.serializer.dump_stream(self._batched(iterator), stream) File "/tmp/hadoop/yarn/local/usercache/root/filecache/21/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 123, in dump_stream for obj in iterator: File "/tmp/hadoop/yarn/local/usercache/root/filecache/21/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 180, in _batched for item in iterator: File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py", line 612, in func File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", line 36, in f SystemError: unknown opcode [duplicate 3] 14/09/03 13:34:16 INFO scheduler.TaskSetManager: Starting task 0.0:1 as TID 9 on executor 3: HDOP-M.AGT (PROCESS_LOCAL) 14/09/03 13:34:16 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as 506275 bytes in 3 ms 14/09/03 13:34:16 WARN scheduler.TaskSetManager: Lost TID 7 (task 0.0:0) 14/09/03 13:34:16 INFO scheduler.TaskSetManager: Loss was due to org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py", line 77, in main serializer.dump_stream(func(split_index, iterator), outfile) File "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 191, in dump_stream self.serializer.dump_stream(self._batched(iterator), stream) File "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 123, in dump_stream for obj in iterator: File "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 180, in _batched for item in iterator: File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py", line 612, in func File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", line 36, in f SystemError: unknown opcode [duplicate 2] 14/09/03 13:34:16 INFO scheduler.TaskSetManager: Starting task 0.0:0 as TID 10 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL) 14/09/03 13:34:16 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as 369810 bytes in 3 ms 14/09/03 13:34:16 WARN scheduler.TaskSetManager: Lost TID 8 (task 0.0:3) 14/09/03 13:34:16 INFO scheduler.TaskSetManager: Loss was due to org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/tmp/hadoop/yarn/local/usercache/root/filecache/21/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py", line 77, in main serializer.dump_stream(func(split_index, iterator), outfile) File "/tmp/hadoop/yarn/local/usercache/root/filecache/21/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 191, in dump_stream self.serializer.dump_stream(self._batched(iterator), stream) File "/tmp/hadoop/yarn/local/usercache/root/filecache/21/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 123, in dump_stream for obj in iterator: File "/tmp/hadoop/yarn/local/usercache/root/filecache/21/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 180, in _batched for item in iterator: File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py", line 612, in func File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", line 36, in f SystemError: unknown opcode [duplicate 4] 14/09/03 13:34:16 INFO scheduler.TaskSetManager: Starting task 0.0:3 as TID 11 on executor 3: HDOP-M.AGT (PROCESS_LOCAL) 14/09/03 13:34:16 INFO scheduler.TaskSetManager: Serialized task 0.0:3 as 506275 bytes in 3 ms 14/09/03 13:34:16 WARN scheduler.TaskSetManager: Lost TID 9 (task 0.0:1) 14/09/03 13:34:16 INFO scheduler.TaskSetManager: Loss was due to org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py", line 77, in main serializer.dump_stream(func(split_index, iterator), outfile) File "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 191, in dump_stream self.serializer.dump_stream(self._batched(iterator), stream) File "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 123, in dump_stream for obj in iterator: File "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 180, in _batched for item in iterator: File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py", line 612, in func File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", line 36, in f SystemError: unknown opcode [duplicate 3] 14/09/03 13:34:16 INFO scheduler.TaskSetManager: Starting task 0.0:1 as TID 12 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL) 14/09/03 13:34:16 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as 506275 bytes in 3 ms 14/09/03 13:34:16 WARN scheduler.TaskSetManager: Lost TID 10 (task 0.0:0) 14/09/03 13:34:16 INFO scheduler.TaskSetManager: Loss was due to org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/tmp/hadoop/yarn/local/usercache/root/filecache/21/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py", line 77, in main serializer.dump_stream(func(split_index, iterator), outfile) File "/tmp/hadoop/yarn/local/usercache/root/filecache/21/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 191, in dump_stream self.serializer.dump_stream(self._batched(iterator), stream) File "/tmp/hadoop/yarn/local/usercache/root/filecache/21/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 123, in dump_stream for obj in iterator: File "/tmp/hadoop/yarn/local/usercache/root/filecache/21/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 180, in _batched for item in iterator: File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py", line 612, in func File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", line 36, in f SystemError: unknown opcode [duplicate 5] 14/09/03 13:34:16 ERROR scheduler.TaskSetManager: Task 0.0:0 failed 4 times; aborting job 14/09/03 13:34:16 INFO cluster.YarnClientClusterScheduler: Cancelling stage 0 14/09/03 13:34:16 INFO cluster.YarnClientClusterScheduler: Stage 0 was cancelled 14/09/03 13:34:16 INFO scheduler.DAGScheduler: Failed to run reduce at /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38 Traceback (most recent call last): File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", line 38, in <module> count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add) File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py", line 619, in reduce vals = self.mapPartitions(func).collect() File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py", line 583, in collect bytesInJava = self._jrdd.collect().iterator() File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", line 537, in __call__ File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py", line 300, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o24.collect. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:0 failed 4 times, most recent failure: Exception failure in TID 10 on host HDOP-N1.AGT: org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/tmp/hadoop/yarn/local/usercache/root/filecache/21/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py", line 77, in main serializer.dump_stream(func(split_index, iterator), outfile) File "/tmp/hadoop/yarn/local/usercache/root/filecache/21/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 191, in dump_stream self.serializer.dump_stream(self._batched(iterator), stream) File "/tmp/hadoop/yarn/local/usercache/root/filecache/21/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 123, in dump_stream for obj in iterator: File "/tmp/hadoop/yarn/local/usercache/root/filecache/21/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py", line 180, in _batched for item in iterator: File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py", line 612, in func File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", line 36, in f SystemError: unknown opcode org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115) org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145) org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) org.apache.spark.scheduler.Task.run(Task.scala:51) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 14/09/03 13:34:16 WARN scheduler.TaskSetManager: Task 11 was killed. 14/09/03 13:34:17 WARN scheduler.TaskSetManager: Loss was due to org.apache.spark.TaskKilledException org.apache.spark.TaskKilledException at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) What should I do to resolve the issue? Thanks Oleg. On Wed, Sep 3, 2014 at 12:51 PM, Oleg Ruchovets <oruchov...@gmail.com> wrote: > Hi , > I change my command to : > ./bin/spark-submit --master spark://HDOP-B.AGT:7077 --num-executors 3 > --driver-memory 4g --executor-memory 2g --executor-cores 1 > examples/src/main/python/pi.py 1000 > and it fixed the problem. > > I still have couple of questions: > PROCESS_LOCAL is not Yarn execution , right? how should I configure the > running on yarn? Should I exeture start-all script on all machine or only > one? Where is the UI / LOGS of spark execution? > > > > > > 152 152 SUCCESS PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:14 0.2 s 0 0 > SUCCESSPROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.9 s 39 ms 2 2 > SUCCESS PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.9 s 39 ms 3 3 > SUCCESSPROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.9 s 39 ms1 ms 4 4 > SUCCESS PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.8 s 39 ms 2 ms 5 5 > SUCCESSPROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.8 s 39 ms1 ms 6 6 > SUCCESS PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.8 s 1 ms 7 7 > SUCCESSPROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.9 s 8 8 SUCCESS > PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:10 0.3 s 9 9 SUCCESS > PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:10 0.4 s 10 10 SUCCESS > PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:10 0.3 s 1 ms 11 11 SUCCESS > PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:10 0.3 s > > > On Wed, Sep 3, 2014 at 12:19 PM, Oleg Ruchovets <oruchov...@gmail.com> > wrote: > >> Hi Andrew. >> what should I do to set master on yarn, can you please pointing me on >> command or documentation how to do it? >> >> >> I am doing the following: >> executed start-all.sh >> [root@HDOP-B sbin]# ./start-all.sh >> starting org.apache.spark.deploy.master.Master, logging to >> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-HDOP-B.AGT.out >> localhost: Warning: Permanently added 'localhost' (RSA) to the list of >> known hosts. >> localhost: starting org.apache.spark.deploy.worker.Worker, logging to >> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-HDOP-B.AGT.out >> >> >> after execute the command: >> ./bin/spark-submit --master spark://HDOP-B.AGT:7077 >> examples/src/main/python/pi.py 1000 >> >> >> the result is the following: >> >> /usr/jdk64/jdk1.7.0_45/bin/java >> >> ::/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/conf:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar >> -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m >> 14/09/03 12:10:06 INFO SecurityManager: Using Spark's default log4j >> profile: org/apache/spark/log4j-defaults.properties >> 14/09/03 12:10:06 INFO SecurityManager: Changing view acls to: root >> 14/09/03 12:10:06 INFO SecurityManager: SecurityManager: authentication >> disabled; ui acls disabled; users with view permissions: Set(root) >> 14/09/03 12:10:07 INFO Slf4jLogger: Slf4jLogger started >> 14/09/03 12:10:07 INFO Remoting: Starting remoting >> 14/09/03 12:10:07 INFO Remoting: Remoting started; listening on addresses >> :[akka.tcp://sp...@hdop-b.agt:38944] >> 14/09/03 12:10:07 INFO Remoting: Remoting now listens on addresses: >> [akka.tcp://sp...@hdop-b.agt:38944] >> 14/09/03 12:10:07 INFO SparkEnv: Registering MapOutputTracker >> 14/09/03 12:10:07 INFO SparkEnv: Registering BlockManagerMaster >> 14/09/03 12:10:08 INFO DiskBlockManager: Created local directory at >> /tmp/spark-local-20140903121008-cf09 >> 14/09/03 12:10:08 INFO MemoryStore: MemoryStore started with capacity >> 294.9 MB. >> 14/09/03 12:10:08 INFO ConnectionManager: Bound socket to port 45041 with >> id = ConnectionManagerId(HDOP-B.AGT,45041) >> 14/09/03 12:10:08 INFO BlockManagerMaster: Trying to register BlockManager >> 14/09/03 12:10:08 INFO BlockManagerInfo: Registering block manager >> HDOP-B.AGT:45041 with 294.9 MB RAM >> 14/09/03 12:10:08 INFO BlockManagerMaster: Registered BlockManager >> 14/09/03 12:10:08 INFO HttpServer: Starting HTTP Server >> 14/09/03 12:10:08 INFO HttpBroadcast: Broadcast server started at >> http://10.193.1.76:59336 >> 14/09/03 12:10:08 INFO HttpFileServer: HTTP File server directory is >> /tmp/spark-7bf5c3c3-1c02-41e8-9fb0-983e175dd45c >> 14/09/03 12:10:08 INFO HttpServer: Starting HTTP Server >> 14/09/03 12:10:08 INFO SparkUI: Started SparkUI at http://HDOP-B.AGT:4040 >> 14/09/03 12:10:09 WARN NativeCodeLoader: Unable to load native-hadoop >> library for your platform... using builtin-java classes where applicable >> 14/09/03 12:10:09 INFO Utils: Copying >> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py >> to /tmp/spark-4e252376-70cb-4171-bf2c-d804524e816c/pi.py >> 14/09/03 12:10:09 INFO SparkContext: Added file >> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py >> at http://10.193.1.76:45893/files/pi.py with timestamp 1409717409277 >> 14/09/03 12:10:09 INFO AppClient$ClientActor: Connecting to master >> spark://HDOP-B.AGT:7077... >> 14/09/03 12:10:09 INFO SparkDeploySchedulerBackend: Connected to Spark >> cluster with app ID app-20140903121009-0000 >> 14/09/03 12:10:09 INFO AppClient$ClientActor: Executor added: >> app-20140903121009-0000/0 on worker-20140903120712-HDOP-B.AGT-51161 >> (HDOP-B.AGT:51161) with 8 cores >> 14/09/03 12:10:09 INFO SparkDeploySchedulerBackend: Granted executor ID >> app-20140903121009-0000/0 on hostPort HDOP-B.AGT:51161 with 8 cores, 512.0 >> MB RAM >> 14/09/03 12:10:09 INFO AppClient$ClientActor: Executor updated: >> app-20140903121009-0000/0 is now RUNNING >> 14/09/03 12:10:12 INFO SparkDeploySchedulerBackend: Registered executor: >> Actor[akka.tcp://sparkexecu...@hdop-b.agt:38143/user/Executor#1295757828] >> with ID 0 >> 14/09/03 12:10:12 INFO BlockManagerInfo: Registering block manager >> HDOP-B.AGT:38670 with 294.9 MB RAM >> Traceback (most recent call last): >> File >> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", >> line 38, in <module> >> count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add) >> File >> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/context.py", >> line 271, in parallelize >> jrdd = readRDDFromFile(self._jsc, tempFile.name, numSlices) >> File >> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", >> line 537, in __call__ >> File >> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py", >> line 300, in get_return_value >> py4j.protocol.Py4JJavaError: An error occurred while calling >> z:org.apache.spark.api.python.PythonRDD.readRDDFromFile. >> : java.lang.OutOfMemoryError: Java heap space >> at >> org.apache.spark.api.python.PythonRDD$.readRDDFromFile(PythonRDD.scala:279) >> at >> org.apache.spark.api.python.PythonRDD.readRDDFromFile(PythonRDD.scala) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) >> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) >> at py4j.Gateway.invoke(Gateway.java:259) >> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) >> at py4j.commands.CallCommand.execute(CallCommand.java:79) >> at py4j.GatewayConnection.run(GatewayConnection.java:207) >> at java.lang.Thread.run(Thread.java:744) >> >> >> >> What should I do to fix the issue >> >> Thanks >> Oleg. >> >> >> On Tue, Sep 2, 2014 at 10:32 PM, Andrew Or <and...@databricks.com> wrote: >> >>> Hi Oleg, >>> >>> If you are running Spark on a yarn cluster, you should set --master to >>> yarn. By default this runs in client mode, which redirects all output of >>> your application to your console. This is failing because it is trying to >>> connect to a standalone master that you probably did not start. I am >>> somewhat puzzled as to how you ran into an OOM from this configuration, >>> however. Does this problem still occur if you set the correct master? >>> >>> -Andrew >>> >>> >>> 2014-09-02 2:42 GMT-07:00 Oleg Ruchovets <oruchov...@gmail.com>: >>> >>> Hi , >>>> I've installed pyspark on hpd hortonworks cluster. >>>> Executing pi example: >>>> >>>> command: >>>> spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563]# >>>> ./bin/spark-submit --master spark://10.193.1.71:7077 >>>> examples/src/main/python/pi.py 1000 >>>> >>>> exception: >>>> >>>> 14/09/02 17:34:02 INFO SecurityManager: Using Spark's default log4j >>>> profile: org/apache/spark/log4j-defaults.properties >>>> 14/09/02 17:34:02 INFO SecurityManager: Changing view acls to: root >>>> 14/09/02 17:34:02 INFO SecurityManager: SecurityManager: authentication >>>> disabled; ui acls disabled; users with view permissions: Set(root) >>>> 14/09/02 17:34:02 INFO Slf4jLogger: Slf4jLogger started >>>> 14/09/02 17:34:02 INFO Remoting: Starting remoting >>>> 14/09/02 17:34:03 INFO Remoting: Remoting started; listening on >>>> addresses :[akka.tcp://sp...@hdop-m.agt:41059] >>>> 14/09/02 17:34:03 INFO Remoting: Remoting now listens on addresses: >>>> [akka.tcp://sp...@hdop-m.agt:41059] >>>> 14/09/02 17:34:03 INFO SparkEnv: Registering MapOutputTracker >>>> 14/09/02 17:34:03 INFO SparkEnv: Registering BlockManagerMaster >>>> 14/09/02 17:34:03 INFO DiskBlockManager: Created local directory at >>>> /tmp/spark-local-20140902173403-cda8 >>>> 14/09/02 17:34:03 INFO MemoryStore: MemoryStore started with capacity >>>> 294.9 MB. >>>> 14/09/02 17:34:03 INFO ConnectionManager: Bound socket to port 34931 >>>> with id = ConnectionManagerId(HDOP-M.AGT,34931) >>>> 14/09/02 17:34:03 INFO BlockManagerMaster: Trying to register >>>> BlockManager >>>> 14/09/02 17:34:03 INFO BlockManagerInfo: Registering block manager >>>> HDOP-M.AGT:34931 with 294.9 MB RAM >>>> 14/09/02 17:34:03 INFO BlockManagerMaster: Registered BlockManager >>>> 14/09/02 17:34:03 INFO HttpServer: Starting HTTP Server >>>> 14/09/02 17:34:03 INFO HttpBroadcast: Broadcast server started at >>>> http://10.193.1.71:54341 >>>> 14/09/02 17:34:03 INFO HttpFileServer: HTTP File server directory is >>>> /tmp/spark-77c7a7dc-181e-4069-a014-8103a6a6330a >>>> 14/09/02 17:34:03 INFO HttpServer: Starting HTTP Server >>>> 14/09/02 17:34:04 INFO SparkUI: Started SparkUI at >>>> http://HDOP-M.AGT:4040 >>>> 14/09/02 17:34:04 WARN NativeCodeLoader: Unable to load native-hadoop >>>> library for your platform... using builtin-java classes where applicable >>>> 14/09/02 17:34:04 INFO Utils: Copying >>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py >>>> to /tmp/spark-f2e0cc0f-59cb-4f6c-9d48-f16205a40c7e/pi.py >>>> 14/09/02 17:34:04 INFO SparkContext: Added file >>>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py >>>> at http://10.193.1.71:52938/files/pi.py with timestamp 1409650444941 >>>> 14/09/02 17:34:05 INFO AppClient$ClientActor: Connecting to master >>>> spark://10.193.1.71:7077... >>>> 14/09/02 17:34:05 WARN AppClient$ClientActor: Could not connect to >>>> akka.tcp://sparkMaster@10.193.1.71:7077: >>>> akka.remote.EndpointAssociationException: Association failed with >>>> [akka.tcp://sparkMaster@10.193.1.71:7077] >>>> 14/09/02 17:34:05 WARN AppClient$ClientActor: Could not connect to >>>> akka.tcp://sparkMaster@10.193.1.71:7077: >>>> akka.remote.EndpointAssociationException: Association failed with >>>> [akka.tcp://sparkMaster@10.193.1.71:7077] >>>> 14/09/02 17:34:05 WARN AppClient$ClientActor: Could not connect to >>>> akka.tcp://sparkMaster@10.193.1.71:7077: >>>> akka.remote.EndpointAssociationException: Association failed with >>>> [akka.tcp://sparkMaster@10.193.1.71:7077] >>>> 14/09/02 17:34:05 WARN AppClient$ClientActor: Could not connect to >>>> akka.tcp://sparkMaster@10.193.1.71:7077: >>>> akka.remote.EndpointAssociationException: Association failed with >>>> [akka.tcp://sparkMaster@10.193.1.71:7077] >>>> 14/09/02 17:34:25 INFO AppClient$ClientActor: Connecting to master >>>> spark://10.193.1.71:7077... >>>> 14/09/02 17:34:25 WARN AppClient$ClientActor: Could not connect to >>>> akka.tcp://sparkMaster@10.193.1.71:7077: >>>> akka.remote.EndpointAssociationException: Association failed with >>>> [akka.tcp://sparkMaster@10.193.1.71:7077] >>>> 14/09/02 17:34:25 WARN AppClient$ClientActor: Could not connect to >>>> akka.tcp://sparkMaster@10.193.1.71:7077: >>>> akka.remote.EndpointAssociationException: Association failed with >>>> [akka.tcp://sparkMaster@10.193.1.71:7077] >>>> 14/09/02 17:34:25 WARN AppClient$ClientActor: Could not connect to >>>> akka.tcp://sparkMaster@10.193.1.71:7077: >>>> akka.remote.EndpointAssociationException: Association failed with >>>> [akka.tcp://sparkMaster@10.193.1.71:7077] >>>> 14/09/02 17:34:25 WARN AppClient$ClientActor: Could not connect to >>>> akka.tcp://sparkMaster@10.193.1.71:7077: >>>> akka.remote.EndpointAssociationException: Association failed with >>>> [akka.tcp://sparkMaster@10.193.1.71:7077] >>>> Traceback (most recent call last): >>>> File >>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", >>>> line 38, in <module> >>>> count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add) >>>> File >>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/context.py", >>>> line 271, in parallelize >>>> jrdd = readRDDFromFile(self._jsc, tempFile.name, numSlices) >>>> File >>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", >>>> line 537, in __call__ >>>> File >>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py", >>>> line 300, in get_return_value >>>> py4j.protocol.Py4JJavaError: An error occurred while calling >>>> z:org.apache.spark.api.python.PythonRDD.readRDDFromFile. >>>> : java.lang.OutOfMemoryError: GC overhead limit exceeded >>>> at >>>> org.apache.spark.api.python.PythonRDD$.readRDDFromFile(PythonRDD.scala:279) >>>> at >>>> org.apache.spark.api.python.PythonRDD.readRDDFromFile(PythonRDD.scala) >>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> at >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>>> at >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>> at java.lang.reflect.Method.invoke(Method.java:606) >>>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) >>>> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) >>>> at py4j.Gateway.invoke(Gateway.java:259) >>>> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) >>>> at py4j.commands.CallCommand.execute(CallCommand.java:79) >>>> at py4j.GatewayConnection.run(GatewayConnection.java:207) >>>> at java.lang.Thread.run(Thread.java:744) >>>> >>>> >>>> >>>> Question: >>>> how can I know spark master and port? Where is it defined? >>>> >>>> Thanks >>>> Oleg. >>>> >>> >>> >> >