We’ve been experiencing this as well, and our simple solution is to actually 
keep trying the ssh connection instead of just waiting:

Something like this:


def wait_for_ssh_connection(opts, host):
  u.message("Waiting for ssh connection to host {}".format(host))
  connected = False
  while (connected==False):
    try:
      if (subprocess.check_call(s.ssh_command(opts) + ['-t', '-t', '%s@%s' % 
(opts.user, host), "ls"])==0):
        connected = True
    except subprocess.CalledProcessError as e:
      print "Ssh connection to host {} failed, retrying in 10 
seconds...".format(host)
      time.sleep(10)
  print "Ssh connection to host {} successfully established!".format(host)


HTH

Pierre Borckmans

RealImpact Analytics | Brussels Office
www.realimpactanalytics.com | pierre.borckm...@realimpactanalytics.com

FR +32 485 91 87 31 | Skype pierre.borckmans





On 19 Apr 2014, at 06:51, Patrick Wendell <pwend...@gmail.com> wrote:

> Unfortunately - I think a lot of this is due to generally increased latency 
> on ec2 itself. I've noticed that it's way more common than it used to be for 
> instances to come online past the "wait" timeout in the ec2 script.
> 
> 
> On Fri, Apr 18, 2014 at 9:11 PM, FRANK AUSTIN NOTHAFT <fnoth...@berkeley.edu> 
> wrote:
> Aureliano,
> 
> I've been noticing this error recently as well:
> 
> ssh: connect to host ec-xx-xx-xx-xx.compute-1.amazonaws.com port 22: 
> Connection refused
> Error 255 while executing remote command, retrying after 30 seconds
> 
> However, this isn't an issue with the spark-ec2 scripts. After the scripts 
> fail, if you wait a bit longer (e.g., another 2 minutes), the EC2 hosts will 
> finish launching and port 22 will open up. Until the EC2 host has launched 
> and opened port 22 for SSH, SSH cannot succeed, and the Spark-ec2 scripts 
> will fail. I've noticed that EC2 machine launch latency seems to be highest 
> in Oregon; I haven't run into this problem on either the California or 
> Virgina EC2 farms. To work around this issue, I've manually modified my copy 
> of the EC2 scripts to wait for 6 failures (i.e., 3 minutes), which seems to 
> work OK. Might be worth a try on your end. I can't comment about the password 
> request; I haven't seen that on my end.
> 
> Regards,
> 
> Frank Austin Nothaft
> fnoth...@berkeley.edu
> fnoth...@eecs.berkeley.edu
> 202-340-0466
> 
> 
> On Fri, Apr 18, 2014 at 8:57 PM, Aureliano Buendia <buendia...@gmail.com> 
> wrote:
> Hi,
> 
> Since 0.9.0 spark-ec2 has gone unstable. During launch it throws many errors 
> like:
> 
> ssh: connect to host ec-xx-xx-xx-xx.compute-1.amazonaws.com port 22: 
> Connection refused
> Error 255 while executing remote command, retrying after 30 seconds
> 
> .. and recently, it prompts for passwords!:
> 
> Warning: Permanently added '' (RSA) to the list of known hosts.
> Password:
> 
> Note that the hostname in Permanently added '' is missing in the log, which 
> is probably why it asks for a password.
> 
> Is this a known bug?
> 
> 

Reply via email to