What instance types did you launch on? Sometimes you also get a bad individual machine from EC2. It might help to remove the node it’s complaining about from the conf/slaves file.
Matei On May 30, 2014, at 11:18 AM, PJ$ <p...@chickenandwaffl.es> wrote: > Hey Folks, > > I'm really having quite a bit of trouble getting spark running on ec2. I'm > not using scripts the https://github.com/apache/spark/tree/master/ec2 because > I'd like to know how everything works. But I'm going a little crazy. I think > that something about the networking configuration must be messed up, but I'm > at a loss. Shortly after starting the cluster, I get a lot of this: > > 14/05/30 18:03:22 INFO master.Master: Registering worker > ip-10-100-184-45.ec2.internal:7078 with 2 cores, 6.3 GB RAM > 14/05/30 18:03:22 INFO master.Master: Registering worker > ip-10-100-184-45.ec2.internal:7078 with 2 cores, 6.3 GB RAM > 14/05/30 18:03:23 INFO master.Master: Registering worker > ip-10-100-184-45.ec2.internal:7078 with 2 cores, 6.3 GB RAM > 14/05/30 18:03:23 INFO master.Master: Registering worker > ip-10-100-184-45.ec2.internal:7078 with 2 cores, 6.3 GB RAM > 14/05/30 18:05:54 INFO master.Master: > akka.tcp://spark@ip-10-100-75-70.ec2.internal:38485 got disassociated, > removing it. > 14/05/30 18:05:54 INFO actor.LocalActorRef: Message > [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from > Actor[akka://sparkMaster/deadLetters] to > Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%4010.100.75.70%3A36725-25#847210246] > was not delivered. [5] dead letters encountered. This logging can be turned > off or adjusted with configuration settings 'akka.log-dead-letters' and > 'akka.log-dead-letters-during-shutdown'. > 14/05/30 18:05:54 INFO master.Master: > akka.tcp://spark@ip-10-100-75-70.ec2.internal:38485 got disassociated, > removing it. > 14/05/30 18:05:54 INFO master.Master: > akka.tcp://spark@ip-10-100-75-70.ec2.internal:38485 got disassociated, > removing it. > 14/05/30 18:05:54 ERROR remote.EndpointWriter: AssociationError > [akka.tcp://sparkMaster@ip-10-100-184-45.ec2.internal:7077] -> > [akka.tcp://spark@ip-10-100-75-70.ec2.internal:38485]: Error [Association > failed with [akka.tcp://spark@ip-10-100-75-70.ec2.internal:38485]] [ > akka.remote.EndpointAssociationException: Association failed with > [akka.tcp://spark@ip-10-100-75-70.ec2.internal:38485] > Caused by: > akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: > Connection refused: ip-10-100-75-70.ec2.internal/10.100.75.70:38485 > ] > 14/05/30 18:05:54 ERROR remote.EndpointWriter: AssociationError > [akka.tcp://sparkMaster@ip-10-100-184-45.ec2.internal:7077] -> > [akka.tcp://spark@ip-10-100-75-70.ec2.internal:38485]: Error [Association > failed with [akka.tcp://spark@ip-10-100-75-70.ec2.internal:38485]] [ > akka.remote.EndpointAssociationException: Association failed with > [akka.tcp://spark@ip-10-100-75-70.ec2.internal:38485] > Caused by: > akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: > Connection refused: ip-10-100-75-70.ec2.internal/10.100.75.70:38485 > ] > 14/05/30 18:05:54 INFO master.Master: > akka.tcp://spark@ip-10-100-75-70.ec2.internal:38485 got disassociated, > removing it. > 14/05/30 18:05:54 INFO master.Master: > akka.tcp://spark@ip-10-100-75-70.ec2.internal:38485 got disassociated, > removing it. > 14/05/30 18:05:54 ERROR remote.EndpointWriter: AssociationError > [akka.tcp://sparkMaster@ip-10-100-184-45.ec2.internal:7077] -> > [akka.tcp://spark@ip-10-100-75-70.ec2.internal:38485]: Error [Association > failed with [akka.tcp://spark@ip-10-100-75-70.ec2.internal:38485]] [ > akka.remote.EndpointAssociationException: Association failed with > [akka.tcp://spark@ip-10-100-75-70.ec2.internal:38485] > Caused by: > akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: > Connection refused: ip-10-100-75-70.ec2.internal/10.100.75.70:38485