Re: apparently non-critical errors running spark-ec2 launch

Nicholas Chammas Mon, 24 Feb 2014 21:43:22 -0800

Alright, that's good to know. And I guess the first of these errors can be
prevented by increasing the wait time via --wait.


Thank you.

Nick


On Mon, Feb 24, 2014 at 9:04 PM, Shivaram Venkataraman <
shiva...@eecs.berkeley.edu> wrote:

> Replies inline
>
> On Mon, Feb 24, 2014 at 5:26 PM, nicholas.chammas
> <nicholas.cham...@gmail.com> wrote:
> > I'm seeing a bunch of (apparently) non-critical errors when launching new
> > clusters with spark-ec2 0.9.0.
> >
> > Here are some of them (emphasis added; names redacted):
> >
> > Generating cluster's SSH key on master...
> >
> > ssh: connect to host ec2-<redacted>.compute-1.amazonaws.com port 22:
> > Connection refused
> >
> > Error executing remote command, retrying after 30 seconds: Command
> '['ssh',
> > '-o', 'StrictHostKeyChecking=no', '-i',
> > '/Users/<redacted>/<redacted>.pem.txt', '-t', '-t',
> > u'root@ec2-<redacted>.compute-1.amazonaws.com', "\n      [ -f
> ~/.ssh/id_rsa
> > ] ||\n        (ssh-keygen -q -t rsa -N '' -f ~/.ssh/id_rsa &&\n
> cat
> > ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys)\n    "]' returned non-zero
> exit
> > status 255
> >
> This is harmless -- EC2 instances sometimes take longer than we expect
> to startup. The retry mechanism handles things like that.
>
> > ...
> >
> > Unpacking Spark
> > ~/spark-ec2
> > Initializing shark
> > ~ ~/spark-ec2
> > ERROR: Unknown Shark version
> > Initializing ephemeral-hdfs
> > ~ ~/spark-ec2 ~/spark-ec2
> >
> > ...
> >
> I think this error happens when there isn't a Shark version
> corresponding to a Spark release. This is the case right now for 0.9 I
> think. The downside of this is that Shark will not be available on
> your cluster.
>
> > RSYNC'ing /root/hive* to slaves...
> > ec2-<redacted>.compute-1.amazonaws.com
> > rsync: link_stat "/root/hive*" failed: No such file or directory (2)
> > rsync error: some files/attrs were not transferred (see previous errors)
> > (code 23) at main.c(1039) [sender=3.0.6]
> >
> I think this is also due to the Shark error. I am not very sure.
>
> > Setting up ephemeral-hdfs
> >
> > ...
> >
> > RSYNC'ing /etc/ganglia to slaves...
> > ec2-<redacted>.compute-1.amazonaws.com
> > Shutting down GANGLIA gmond:                               [FAILED]
> > Starting GANGLIA gmond:                                    [  OK  ]
> > Shutting down GANGLIA gmond:                               [FAILED]
> > Starting GANGLIA gmond:                                    [  OK  ]
> > Connection to ec2-<redacted>.compute-1.amazonaws.com closed.
> > Shutting down GANGLIA gmetad:                              [FAILED]
> > Starting GANGLIA gmetad:                                   [  OK  ]
> > Stopping httpd:                                            [FAILED]
> > Starting httpd:                                            [  OK  ]
> >
>
> This is expected and just comes from using service restarts on Linux.
> It just says that ganglia isn't running when we try to stop it.
>
> > Are these errors known to be harmless, or somehow expected/normal?
> >
> > When I log in to the cluster the shell starts up fine.
> >
> > Nick
> >
> >
> > ________________________________
> > View this message in context: apparently non-critical errors running
> > spark-ec2 launch
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: apparently non-critical errors running spark-ec2 launch

Reply via email to