Hi,

I'm new to Spark. I have built small spark on yarn cluster, which contains
1 master(20GB RAM, 8 core), 3 worker(4GB RAM, 4 core). When trying to run a
command sc.parallelize(1 to 1000).count() through
$SPARK_HOME/bin/spark-shell, sometimes the command can submit a job
successfully, sometimes it is failure with following exception.

I can definitely make sure the three workers are registered to master after
checking out spark webui. There are spark memory-related parameters to be
configured in spark-env.sh file, for instance, SPARK_EXECUTOR_MEMORY=2G,
SPARK_DRIVER_MEMORY=1G, SPARK_WORKER_MEMORY=4G.

Would anyone help to give me hint how to resolve this issue? I have not
give any hint after google search.


































































































*# bin/spark-shellSpark assembly has been built with Hive, including
Datanucleus jars on classpath15/02/11 12:21:39 INFO SecurityManager:
Changing view acls to: root,15/02/11 12:21:39 INFO SecurityManager:
Changing modify acls to: root,15/02/11 12:21:39 INFO SecurityManager:
SecurityManager: authentication disabled; ui acls disabled; users with view
permissions: Set(root, ); users with modify permissions: Set(root,
)15/02/11 12:21:39 INFO HttpServer: Starting HTTP Server15/02/11 12:21:39
INFO Utils: Successfully started service 'HTTP class server' on port
28968.Welcome to      ____              __     / __/__  ___ _____/ /__
_\ \/ _ \/ _ `/ __/  '_/   /___/ .__/\_,_/_/ /_/\_\   version 1.1.0
/_/Using Scala version 2.10.4 (OpenJDK 64-Bit Server VM, Java 1.6.0_24)Type
in expressions to have them evaluated.Type :help for more
information.15/02/11 12:21:43 INFO SecurityManager: Changing view acls to:
root,15/02/11 12:21:43 INFO SecurityManager: Changing modify acls to:
root,15/02/11 12:21:43 INFO SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users with view permissions:
Set(root, ); users with modify permissions: Set(root, )15/02/11 12:21:44
INFO Slf4jLogger: Slf4jLogger started15/02/11 12:21:44 INFO Remoting:
Starting remoting15/02/11 12:21:44 INFO Remoting: Remoting started;
listening on addresses :[akka.tcp://sparkDriver@xpan-biqa1:6862]15/02/11
12:21:44 INFO Remoting: Remoting now listens on addresses:
[akka.tcp://sparkDriver@xpan-biqa1:6862]15/02/11 12:21:44 INFO Utils:
Successfully started service 'sparkDriver' on port 6862.15/02/11 12:21:44
INFO SparkEnv: Registering MapOutputTracker15/02/11 12:21:44 INFO SparkEnv:
Registering BlockManagerMaster15/02/11 12:21:44 INFO DiskBlockManager:
Created local directory at /tmp/spark-local-20150211122144-ed2615/02/11
12:21:44 INFO Utils: Successfully started service 'Connection manager for
block manager' on port 40502.15/02/11 12:21:44 INFO ConnectionManager:
Bound socket to port 40502 with id =
ConnectionManagerId(xpan-biqa1,40502)15/02/11 12:21:44 INFO MemoryStore:
MemoryStore started with capacity 265.0 MB15/02/11 12:21:44 INFO
BlockManagerMaster: Trying to register BlockManager15/02/11 12:21:44 INFO
BlockManagerMasterActor: Registering block manager xpan-biqa1:40502 with
265.0 MB RAM15/02/11 12:21:44 INFO BlockManagerMaster: Registered
BlockManager15/02/11 12:21:44 INFO HttpFileServer: HTTP File server
directory is /tmp/spark-0a80ce6b-6a05-4163-a97d-07753f627ec815/02/11
12:21:44 INFO HttpServer: Starting HTTP Server15/02/11 12:21:44 INFO Utils:
Successfully started service 'HTTP file server' on port 25939.15/02/11
12:21:44 INFO Utils: Successfully started service 'SparkUI' on port
4040.15/02/11 12:21:44 INFO SparkUI: Started SparkUI at
http://xpan-biqa1:4040 <http://xpan-biqa1:4040>15/02/11 12:21:45 WARN
NativeCodeLoader: Unable to load native-hadoop library for your platform...
using builtin-java classes where applicable15/02/11 12:21:46 INFO
EventLoggingListener: Logging events to
hdfs://xpan-biqa1:7020/spark/spark-shell-142362850543115/02/11 12:21:46
INFO AppClient$ClientActor: Connecting to master
spark://xpan-biqa1:7077...15/02/11 12:21:46 INFO
SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling
beginning after reached minRegisteredResourcesRatio: 0.015/02/11 12:21:46
INFO SparkILoop: Created spark context..Spark context available as
sc.scala> 15/02/11 12:22:06 INFO AppClient$ClientActor: Connecting to
master spark://xpan-biqa1:7077...scala> sc.parallelize(1 to
1000).count()15/02/11 12:22:24 INFO SparkContext: Starting job: count at
<console>:1315/02/11 12:22:24 INFO DAGScheduler: Got job 0 (count at
<console>:13) with 2 output partitions (allowLocal=false)15/02/11 12:22:24
INFO DAGScheduler: Final stage: Stage 0(count at <console>:13)15/02/11
12:22:24 INFO DAGScheduler: Parents of final stage: List()15/02/11 12:22:24
INFO DAGScheduler: Missing parents: List()15/02/11 12:22:24 INFO
DAGScheduler: Submitting Stage 0 (ParallelCollectionRDD[0] at parallelize
at <console>:13), which has no missing parents15/02/11 12:22:24 INFO
MemoryStore: ensureFreeSpace(1088) called with curMem=0,
maxMem=27784249315/02/11 12:22:24 INFO MemoryStore: Block broadcast_0
stored as values in memory (estimated size 1088.0 B, free 265.0 MB)15/02/11
12:22:24 INFO MemoryStore: ensureFreeSpace(800) called with curMem=1088,
maxMem=27784249315/02/11 12:22:24 INFO MemoryStore: Block
broadcast_0_piece0 stored as bytes in memory (estimated size 800.0 B, free
265.0 MB)15/02/11 12:22:24 INFO BlockManagerInfo: Added broadcast_0_piece0
in memory on xpan-biqa1:40502 (size: 800.0 B, free: 265.0 MB)15/02/11
12:22:24 INFO BlockManagerMaster: Updated info of block
broadcast_0_piece015/02/11 12:22:24 INFO DAGScheduler: Submitting 2 missing
tasks from Stage 0 (ParallelCollectionRDD[0] at parallelize at
<console>:13)15/02/11 12:22:24 INFO TaskSchedulerImpl: Adding task set 0.0
with 2 tasks15/02/11 12:22:26 INFO AppClient$ClientActor: Connecting to
master spark://xpan-biqa1:7077...15/02/11 12:22:39 WARN TaskSchedulerImpl:
Initial job has not accepted any resources; check your cluster UI to ensure
that workers are registered and have sufficient memory15/02/11 12:22:46
ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All
masters are unresponsive! Giving up.15/02/11 12:22:46 INFO
TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed,
from pool15/02/11 12:22:46 INFO TaskSchedulerImpl: Cancelling stage
015/02/11 12:22:46 INFO DAGScheduler: Failed to run count at
<console>:1315/02/11 12:22:46 INFO SparkUI: Stopped Spark web UI at
http://xpan-biqa1:4040 <http://xpan-biqa1:4040>15/02/11 12:22:46 INFO
DAGScheduler: Stopping DAGScheduler15/02/11 12:22:46 INFO
SparkDeploySchedulerBackend: Shutting down all executors15/02/11 12:22:46
INFO SparkDeploySchedulerBackend: Asking each executor to shut
downorg.apache.spark.SparkException: Job aborted due to stage failure: All
masters are unresponsive! Giving up.        at
org.apache.spark.scheduler.DAGScheduler.org
<http://org.apache.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
at scala.Option.foreach(Option.scala:236)        at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)        at
akka.actor.ActorCell.invoke(ActorCell.scala:456)        at
akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)        at
akka.dispatch.Mailbox.run(Mailbox.scala:219)        at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)scala>
15/02/11 12:22:47 INFO MapOutputTrackerMasterActor: MapOutputTrackerActor
stopped!*


Regards,
Ryan

Reply via email to