No, I am not using the aws. I am using one of the national lab's cluster. But as I mentioned, I am pretty new to computer science, so I might not be answering your question right... but 7077 is accessible.
Maybe I got it wrong from the get-go? I will just write down what I did... Basically I have a cluster with bunch of nodes (call them #1 ~ #10), picked one node (call it #1) to be the master (and one of the workers) I updated the conf/spark-env.sh file with MASTER_IP, MASTER_PORT, MASTER_WEBUI_PORT, CORES, MEMORY, WORKER_PORT, WORKER_WEBUI_PORT I start the master on #1 with ./sbin/start-master.sh But ./sbin/start-slaves.sh doesn't work for me, so I wrote a script that ssh into the worker nodes (#1 ~ #10) and start the worker: for server in $(cat /somedirectory/hostnames.txt) do ssh $server "nohup /somedirectory/somedirectory/spark-0.9.1/bin/spark-class org.apache.spark.deploy.worker.Worker spark://MASTER_IP:MASTER_PORT > /somedirectory/nohup.out & exit" done then I go to #1, and start ./bin/spark-shell and that's when I get that error message. Sorry if it got more confusing.. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ERROR-TaskSchedulerImpl-Lost-an-executor-tp4566p4610.html Sent from the Apache Spark User List mailing list archive at Nabble.com.