We’ve been experiencing this as well, and our simple solution is to actually keep trying the ssh connection instead of just waiting:
Something like this: def wait_for_ssh_connection(opts, host): u.message("Waiting for ssh connection to host {}".format(host)) connected = False while (connected==False): try: if (subprocess.check_call(s.ssh_command(opts) + ['-t', '-t', '%s@%s' % (opts.user, host), "ls"])==0): connected = True except subprocess.CalledProcessError as e: print "Ssh connection to host {} failed, retrying in 10 seconds...".format(host) time.sleep(10) print "Ssh connection to host {} successfully established!".format(host) HTH Pierre Borckmans RealImpact Analytics | Brussels Office www.realimpactanalytics.com | pierre.borckm...@realimpactanalytics.com FR +32 485 91 87 31 | Skype pierre.borckmans On 19 Apr 2014, at 06:51, Patrick Wendell <pwend...@gmail.com> wrote: > Unfortunately - I think a lot of this is due to generally increased latency > on ec2 itself. I've noticed that it's way more common than it used to be for > instances to come online past the "wait" timeout in the ec2 script. > > > On Fri, Apr 18, 2014 at 9:11 PM, FRANK AUSTIN NOTHAFT <fnoth...@berkeley.edu> > wrote: > Aureliano, > > I've been noticing this error recently as well: > > ssh: connect to host ec-xx-xx-xx-xx.compute-1.amazonaws.com port 22: > Connection refused > Error 255 while executing remote command, retrying after 30 seconds > > However, this isn't an issue with the spark-ec2 scripts. After the scripts > fail, if you wait a bit longer (e.g., another 2 minutes), the EC2 hosts will > finish launching and port 22 will open up. Until the EC2 host has launched > and opened port 22 for SSH, SSH cannot succeed, and the Spark-ec2 scripts > will fail. I've noticed that EC2 machine launch latency seems to be highest > in Oregon; I haven't run into this problem on either the California or > Virgina EC2 farms. To work around this issue, I've manually modified my copy > of the EC2 scripts to wait for 6 failures (i.e., 3 minutes), which seems to > work OK. Might be worth a try on your end. I can't comment about the password > request; I haven't seen that on my end. > > Regards, > > Frank Austin Nothaft > fnoth...@berkeley.edu > fnoth...@eecs.berkeley.edu > 202-340-0466 > > > On Fri, Apr 18, 2014 at 8:57 PM, Aureliano Buendia <buendia...@gmail.com> > wrote: > Hi, > > Since 0.9.0 spark-ec2 has gone unstable. During launch it throws many errors > like: > > ssh: connect to host ec-xx-xx-xx-xx.compute-1.amazonaws.com port 22: > Connection refused > Error 255 while executing remote command, retrying after 30 seconds > > .. and recently, it prompts for passwords!: > > Warning: Permanently added '' (RSA) to the list of known hosts. > Password: > > Note that the hostname in Permanently added '' is missing in the log, which > is probably why it asks for a password. > > Is this a known bug? > >