Hi Akhil et al.,I made the following changes:
In spark-env.sh I added the following three entries (standalone mode)
export SPARK_MASTER_IP=pzxnvm2018.x.y.name.orgexport 
SPARK_WORKER_MEMORY=4Gexport SPARK_WORKER_CORES=3
I then use start-master and start-slaves commands to start the services. 
Another sthing that I have noticed is that the number of cores that I specified 
is npot  used: 2022 shows up with only 1 core and 2023 and 2024 show up with 4 
cores. 
In the Web UI:URL: spark://pzxnvm2018.x.y.name.org:7077
I run the spark shell command from pzxnvm2018. 
/etc/hosts on my master node has following entry:master-ip      
pzxnvm2018.x.y.name.org pzxnvm2018
/etc/hosts on my a worker node has following entry:
worker-ip              pzxnvm2023.x.y.name.org pzxnvm2023

However, on my master node log file I still see this:
ERROR EndpointWriter: AssociationError 
[akka.tcp://sparkmas...@pzxnvm2018.x.y.name.org:7077] -> 
[akka.tcp://spark@localhost:43569]: Error [Association failed with 
[akka.tcp://spark@localhost:43569]]
My spark-shell has the following o/p

scala> 14/07/08 10:01:39 INFO SparkDeploySchedulerBackend: Connected to Spark 
cluster with app ID app-20140708100139-000014/07/08 10:01:39 INFO 
AppClient$ClientActor: Executor added: app-20140708100139-0000/0 on 
worker-20140708095558-pzxnvm2024.x.y.name.orgg-50218 
(pzxnvm2024.dcld.pldc.kp.org:50218) with 4 cores14/07/08 10:01:39 INFO 
SparkDeploySchedulerBackend: Granted executor ID app-20140708100139-0000/0 on 
hostPort pzxnvm2024.x.y.name.org:50218 with 4 cores, 512.0 MB RAM14/07/08 
10:01:39 INFO AppClient$ClientActor: Executor added: app-20140708100139-0000/1 
on worker-20140708095559-pzxnvm2023.x.y.name.org-38294 
(pzxnvm2023.dcld.pldc.kp.org:38294) with 4 cores14/07/08 10:01:39 INFO 
SparkDeploySchedulerBackend: Granted executor ID app-20140708100139-0000/1 on 
hostPort pzxnvm2023.x.y.name.org:38294 with 4 cores, 512.0 MB RAM14/07/08 
10:01:39 INFO AppClient$ClientActor: Executor added: app-20140708100139-0000/2 
on worker-20140708095559-pzxnvm2022.x.y.name.org-41826 
(pzxnvm2022.dcld.pldc.kp.org:41826) with 1 cores14/07/08 10:01:39 INFO 
SparkDeploySchedulerBackend: Granted executor ID app-20140708100139-0000/2 on 
hostPort pzxnvm2022.x.y.name.org:41826 with 1 cores, 512.0 MB RAM14/07/08 
10:01:40 INFO AppClient$ClientActor: Executor updated: 
app-20140708100139-0000/0 is now RUNNING14/07/08 10:01:40 INFO 
AppClient$ClientActor: Executor updated: app-20140708100139-0000/1 is now 
RUNNING14/07/08 10:01:40 INFO AppClient$ClientActor: Executor updated: 
app-20140708100139-0000/2 is now RUNNING14/07/08 10:01:42 INFO 
AppClient$ClientActor: Executor updated: app-20140708100139-0000/0 is now 
FAILED (Command exited with code 1)14/07/08 10:01:42 INFO 
SparkDeploySchedulerBackend: Executor app-20140708100139-0000/0 removed: 
Command exited with code 114/07/08 10:01:42 INFO AppClient$ClientActor: 
Executor added: app-20140708100139-0000/3 on 
worker-20140708095558-pzxnvm2024.x.y.name.org-50218 
(pzxnvm2024.dcld.pldc.kp.org:50218) with 4 cores14/07/08 10:01:42 INFO 
SparkDeploySchedulerBackend: Granted executor ID app-20140708100139-0000/3 on 
hostPort pzxnvm2024.x.y.name.org:50218 with 4 cores, 512.0 MB RAM14/07/08 
10:01:42 INFO AppClient$ClientActor: Executor updated: 
app-20140708100139-0000/3 is now RUNNING14/07/08 10:01:42 INFO 
AppClient$ClientActor: Executor updated: app-20140708100139-0000/1 is now 
FAILED (Command exited with code 1)14/07/08 10:01:42 INFO 
SparkDeploySchedulerBackend: Executor app-20140708100139-0000/1 removed: 
Command exited with code 114/07/08 10:01:42 INFO AppClient$ClientActor: 
Executor added: app-20140708100139-0000/4 on 
worker-20140708095559-pzxnvm2023.x.y.name.org-38294 
(pzxnvm2023.dcld.pldc.kp.org:38294) with 4 cores14/07/08 10:01:42 INFO 
SparkDeploySchedulerBackend: Granted executor ID app-20140708100139-0000/4 on 
hostPort pzxnvm2023.x.y.name.org:38294 with 4 cores, 512.0 MB RAM14/07/08 
10:01:42 INFO AppClient$ClientActor: Executor updated: 
app-20140708100139-0000/4 is now RUNNING14/07/08 10:01:43 INFO 
AppClient$ClientActor: Executor updated: app-20140708100139-0000/2 is now 
FAILED (Command exited with code 1)14/07/08 10:01:43 INFO 
SparkDeploySchedulerBackend: Executor app-20140708100139-0000/2 removed: 
Command exited with code 114/07/08 10:01:43 INFO AppClient$ClientActor: 
Executor added: app-20140708100139-0000/5 on 
worker-20140708095559-pzxnvm2022.x.y.name.org-41826 
(pzxnvm2022.dcld.pldc.kp.org:41826) with 1 cores14/07/08 10:01:43 INFO 
SparkDeploySchedulerBackend: Granted executor ID app-20140708100139-0000/5 on 
hostPort pzxnvm2022.x.y.name.org:41826 with 1 cores, 512.0 MB RAM14/07/08 
10:01:43 INFO AppClient$ClientActor: Executor updated: 
app-20140708100139-0000/5 is now RUNNING14/07/08 10:01:44 INFO 
AppClient$ClientActor: Executor updated: app-20140708100139-0000/3 is now 
FAILED (Command exited with code 1)14/07/08 10:01:44 INFO 
SparkDeploySchedulerBackend: Executor app-20140708100139-0000/3 removed: 
Command exited with code 114/07/08 10:01:44 INFO AppClient$ClientActor: 
Executor added: app-20140708100139-0000/6 on 
worker-20140708095558-pzxnvm2024.x.y.name.org-50218 
(pzxnvm2024.dcld.pldc.kp.org:50218) with 4 cores14/07/08 10:01:44 INFO 
SparkDeploySchedulerBackend: Granted executor ID app-20140708100139-0000/6 on 
hostPort pzxnvm2024.x.y.name.org:50218 with 4 cores, 512.0 MB RAM14/07/08 
10:01:44 INFO AppClient$ClientActor: Executor updated: 
app-20140708100139-0000/6 is now RUNNING14/07/08 10:01:45 INFO 
AppClient$ClientActor: Executor updated: app-20140708100139-0000/4 is now 
FAILED (Command exited with code 1)14/07/08 10:01:45 INFO 
SparkDeploySchedulerBackend: Executor app-20140708100139-0000/4 removed: 
Command exited with code 114/07/08 10:01:45 INFO AppClient$ClientActor: 
Executor added: app-20140708100139-0000/7 on 
worker-20140708095559-pzxnvm2023.x.y.name.org-38294 
(pzxnvm2023.dcld.pldc.kp.org:38294) with 4 cores

Date: Tue, 8 Jul 2014 12:29:21 +0530
Subject: Re: Spark: All masters are unresponsive!
From: ak...@sigmoidanalytics.com
To: user@spark.apache.org

Are you sure this is your master URL spark://pzxnvm2018:7077 ?

You can look it up in the WebUI (mostly http://pzxnvm2018:8080) top left 
corner. Also make sure you are able to telnet pzxnvm2018 7077 from the machines 
where you are running the spark shell. 
ThanksBest Regards


On Tue, Jul 8, 2014 at 12:21 PM, Sameer Tilak <ssti...@live.com> wrote:




Hi All,
I am having a few issues with stability and scheduling. When I use spark shell 
to submit my application. I get the following error message and spark shell 
crashes. I have a small 4-node cluster for PoC. I tried both manual and 
scripts-based cluster set up. I tried using FQDN as well for specifying the 
master node, but no luck.  

14/07/07 23:44:35 INFO DAGScheduler: Submitting 2 missing tasks from Stage 1 
(MappedRDD[6] at map at JaccardScore.scala:83)14/07/07 23:44:35 INFO 
TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
14/07/07 23:44:35 INFO TaskSetManager: Starting task 1.0:0 as TID 1 on executor 
localhost: localhost (PROCESS_LOCAL)14/07/07 23:44:35 INFO TaskSetManager: 
Serialized task 1.0:0 as 2322 bytes in 0 ms
14/07/07 23:44:35 INFO TaskSetManager: Starting task 1.0:1 as TID 2 on executor 
localhost: localhost (PROCESS_LOCAL)14/07/07 23:44:35 INFO TaskSetManager: 
Serialized task 1.0:1 as 2322 bytes in 0 ms14/07/07 23:44:35 INFO Executor: 
Running task ID 1
14/07/07 23:44:35 INFO Executor: Running task ID 214/07/07 23:44:35 INFO 
BlockManager: Found block broadcast_1 locally14/07/07 23:44:35 INFO 
BlockManager: Found block broadcast_1 locally
14/07/07 23:44:35 INFO HadoopRDD: Input split: 
hdfs://pzxnvm2018:54310/data/sameer_7-2-2014_3mm_sentences.tsv:0+9723938914/07/07
 23:44:35 INFO HadoopRDD: Input split: 
hdfs://pzxnvm2018:54310/data/sameer_7-2-2014_3mm_sentences.tsv:97239389+97239390
14/07/07 23:44:54 INFO AppClient$ClientActor: Connecting to master 
spark://pzxnvm2018:7077...14/07/07 23:45:14 INFO AppClient$ClientActor: 
Connecting to master spark://pzxnvm2018:7077...14/07/07 23:45:35 ERROR 
SparkDeploySchedulerBackend: Application has been killed. Reason: All masters 
are unresponsive! Giving up.
14/07/07 23:45:35 ERROR TaskSchedulerImpl: Exiting due to error from cluster 
scheduler: All masters are unresponsive! Giving up.14/07/07 23:45:35 WARN 
HadoopRDD: Exception in RecordReader.close()
java.io.IOException: Filesystem closed  at 
org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:264)       at 
org.apache.hadoop.hdfs.DFSClient.access$1100(DFSClient.java:74)
        at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.close(DFSClient.java:2135)   at 
java.io.FilterInputStream.close(FilterInputStream.java:181)
        at org.apache.hadoop.util.LineReader.close(LineReader.java:83)  at 
org.apache.hadoop.mapred.LineRecordReader.close(LineRecordReader.java:168)
        at org.apache.spark.rdd.HadoopRDD$$anon$1.close(HadoopRDD.scala:208)    
at org.apache.spark.util.NextIterator.closeIfNeeded(NextIterator.scala:63)
        at 
org.apache.spark.rdd.HadoopRDD$$anon$1$$anonfun$1.apply$mcV$sp(HadoopRDD.scala:193)
  at 
org.apache.spark.TaskContext$$anonfun$executeOnCompleteCallbacks$1.apply(TaskContext.scala:63)
        at 
org.apache.spark.TaskContext$$anonfun$executeOnCompleteCallbacks$1.apply(TaskContext.scala:63)
       at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)   
at org.apache.spark.TaskContext.executeOnCompleteCallbacks(TaskContext.scala:63)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:113)  
at org.apache.spark.scheduler.Task.run(Task.scala:51)
        at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) 
     at java.lang.Thread.run(Thread.java:722)
14/07/07 23:45:35 ERROR Executor: Exception in task ID 2java.io.IOException: 
Filesystem closed  at 
org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:264)
        at org.apache.hadoop.hdfs.DFSClient.access$1100(DFSClient.java:74)      
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2213)
        at java.io.DataInputStream.read(DataInputStream.java:100)       at 
org.apache.hadoop.util.LineReader.readLine(LineReader.java:134)
        at 
org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:133)    at 
org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:38)
        at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:198)  
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:181)
        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)    
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)       
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)       
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)  at 
scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)     at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
        at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)     at 
scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
        at scala.collection.AbstractIterator.to(Iterator.scala:1157)    at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
        at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)      
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
        at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)       
at org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:717)
        at org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:717)    at 
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080)
        at 
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080)  
     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
        at org.apache.spark.scheduler.Task.run(Task.scala:51)   at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) 
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)                                
          

                                          

Reply via email to