Alright, that's good to know. And I guess the first of these errors can be prevented by increasing the wait time via --wait.
Thank you. Nick On Mon, Feb 24, 2014 at 9:04 PM, Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > Replies inline > > On Mon, Feb 24, 2014 at 5:26 PM, nicholas.chammas > <nicholas.cham...@gmail.com> wrote: > > I'm seeing a bunch of (apparently) non-critical errors when launching new > > clusters with spark-ec2 0.9.0. > > > > Here are some of them (emphasis added; names redacted): > > > > Generating cluster's SSH key on master... > > > > ssh: connect to host ec2-<redacted>.compute-1.amazonaws.com port 22: > > Connection refused > > > > Error executing remote command, retrying after 30 seconds: Command > '['ssh', > > '-o', 'StrictHostKeyChecking=no', '-i', > > '/Users/<redacted>/<redacted>.pem.txt', '-t', '-t', > > u'root@ec2-<redacted>.compute-1.amazonaws.com', "\n [ -f > ~/.ssh/id_rsa > > ] ||\n (ssh-keygen -q -t rsa -N '' -f ~/.ssh/id_rsa &&\n > cat > > ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys)\n "]' returned non-zero > exit > > status 255 > > > This is harmless -- EC2 instances sometimes take longer than we expect > to startup. The retry mechanism handles things like that. > > > ... > > > > Unpacking Spark > > ~/spark-ec2 > > Initializing shark > > ~ ~/spark-ec2 > > ERROR: Unknown Shark version > > Initializing ephemeral-hdfs > > ~ ~/spark-ec2 ~/spark-ec2 > > > > ... > > > I think this error happens when there isn't a Shark version > corresponding to a Spark release. This is the case right now for 0.9 I > think. The downside of this is that Shark will not be available on > your cluster. > > > RSYNC'ing /root/hive* to slaves... > > ec2-<redacted>.compute-1.amazonaws.com > > rsync: link_stat "/root/hive*" failed: No such file or directory (2) > > rsync error: some files/attrs were not transferred (see previous errors) > > (code 23) at main.c(1039) [sender=3.0.6] > > > I think this is also due to the Shark error. I am not very sure. > > > Setting up ephemeral-hdfs > > > > ... > > > > RSYNC'ing /etc/ganglia to slaves... > > ec2-<redacted>.compute-1.amazonaws.com > > Shutting down GANGLIA gmond: [FAILED] > > Starting GANGLIA gmond: [ OK ] > > Shutting down GANGLIA gmond: [FAILED] > > Starting GANGLIA gmond: [ OK ] > > Connection to ec2-<redacted>.compute-1.amazonaws.com closed. > > Shutting down GANGLIA gmetad: [FAILED] > > Starting GANGLIA gmetad: [ OK ] > > Stopping httpd: [FAILED] > > Starting httpd: [ OK ] > > > > This is expected and just comes from using service restarts on Linux. > It just says that ganglia isn't running when we try to stop it. > > > Are these errors known to be harmless, or somehow expected/normal? > > > > When I log in to the cluster the shell starts up fine. > > > > Nick > > > > > > ________________________________ > > View this message in context: apparently non-critical errors running > > spark-ec2 launch > > Sent from the Apache Spark User List mailing list archive at Nabble.com. >