No, I am not using the aws. I am using one of the national lab's cluster. But
as I mentioned, I am pretty new to computer science, so I might not be
answering your question right... but 7077 is accessible.

Maybe I got it wrong from the get-go? I will just write down what I did...

Basically I have a cluster with bunch of nodes (call them #1 ~ #10), picked
one node (call it #1) to be the master (and one of the workers)

I updated the conf/spark-env.sh file with MASTER_IP, MASTER_PORT,
MASTER_WEBUI_PORT, CORES, MEMORY, WORKER_PORT, WORKER_WEBUI_PORT

I start the master on #1 with ./sbin/start-master.sh

But ./sbin/start-slaves.sh doesn't work for me, so I wrote a script that ssh
into the worker nodes (#1 ~ #10) and start the worker:

for server in $(cat /somedirectory/hostnames.txt)
do
ssh $server "nohup /somedirectory/somedirectory/spark-0.9.1/bin/spark-class
org.apache.spark.deploy.worker.Worker spark://MASTER_IP:MASTER_PORT >
/somedirectory/nohup.out & exit"
done


then I go to #1, and start ./bin/spark-shell and that's when I get that
error message.

Sorry if it got more confusing..



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/ERROR-TaskSchedulerImpl-Lost-an-executor-tp4566p4610.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to