Yeah, as Shivaram mentioned, this issue is well-known. It's documented in
SPARK-5189 <https://issues.apache.org/jira/browse/SPARK-5189> and a bunch
of related issues. Unfortunately, it's hard to resolve this issue in
spark-ec2 without rewriting large parts of the project. But if you take a
crack at it and succeed I'm sure a lot of people will be happy.

I've started a separate project <https://github.com/nchammas/flintrock> --
which Shivaram also mentioned -- which aims to solve the problem of long
launch times and other issues
<https://github.com/nchammas/flintrock#motivation> with spark-ec2. It's
still very young and lacks several critical features, but we are making
steady progress.

Nick

On Thu, Nov 5, 2015 at 12:30 PM Shivaram Venkataraman <
shiva...@eecs.berkeley.edu> wrote:

> It is a known limitation that spark-ec2 is very slow for large
> clusters and as you mention most of this is due to the use of rsync to
> transfer things from the master to all the slaves.
>
> Nick cc'd has been working on an alternative approach at
> https://github.com/nchammas/flintrock that is more scalable.
>
> Thanks
> Shivaram
>
> On Thu, Nov 5, 2015 at 8:12 AM, Christian <engr...@gmail.com> wrote:
> > For starters, thanks for the awesome product!
> >
> > When creating ec2-clusters of 20-40 nodes, things work great. When we
> create
> > a cluster with the provided spark-ec2 script, it takes hours. When
> creating
> > a 200 node cluster, it takes 2 1/2 hours and for a 500 node cluster it
> takes
> > over 5 hours. One other problem we are having is that some nodes don't
> come
> > up when the other ones do, the process seems to just move on, skipping
> the
> > rsync and any installs on those ones.
> >
> > My guess as to why it takes so long to set up a large cluster is because
> of
> > the use of rsync. What if instead of using rsync, you synched to s3 and
> then
> > did a pdsh to pull it down on all of the machines. This is a big deal
> for us
> > and if we can come up with a good plan, we might be able help out with
> the
> > required changes.
> >
> > Are there any suggestions on how to deal with some of the nodes not being
> > ready when the process starts?
> >
> > Thanks for your time,
> > Christian
> >
>

Reply via email to