Re: Shark resilience to unusable slaves

Praveen R Fri, 23 May 2014 03:03:24 -0700

You might use bin/shark-withdebug to find the exact issue for the failure.

That said, easiest way to get the cluster running, is to get rid of
dis-functional machine from spark cluster (remove it from slaves file).
Hope that helps.



On Thu, May 22, 2014 at 9:04 PM, Yana Kadiyska <yana.kadiy...@gmail.com>wrote:

> Hi, I am running into a pretty concerning issue with Shark (granted I'm
> running v. 0.8.1).
>
> I have a Spark slave node that has run out of disk space. When I try to
> start Shark it attempts to deploy the application to a directory on that
> node, fails and eventually gives up  (I see a "Master Removed our
> application" message in the shark server log).
>
> Is Spark supposed to be able to ignore a slave if something goes wrong for
> it (I realize that the slave probably appears "alive" enough)? I restarted
> the Spark master in hopes that it would detect that the slave is suffering
> but it doesn't seem to be the case.
>
> Any thoughts appreciated -- we'll monitor disk space but I'm a little
> worried that the cluster is not functional on account of a single slave.
>

Re: Shark resilience to unusable slaves

Reply via email to