Re: standalone vs YARN

Prashant Sharma Tue, 15 Apr 2014 05:39:48 -0700

Hi Ishaaq,

answers inline from what I know, I had like to be corrected though.


On Tue, Apr 15, 2014 at 5:58 PM, ishaaq <ish...@gmail.com> wrote:

> Hi all,
> I am evaluating Spark to use here at my work.
>
> We have an existing Hadoop 1.x install which I planning to upgrade to
> Hadoop
> 2.3.
>
> This is not really a requirement for spark, if you are doing for some
other reason great !


> I am trying to work out whether I should install YARN or simply just setup
> a
> Spark standalone cluster. We already use ZooKeeper so it isn't a problem to
> setup HA. I am puzzled however as to how the Spark nodes can coordinate on
> data locality - i.e., assuming I install the nodes on the same machines as
> the DFS data nodes, I don't understand how Spark can work out which nodes
> should get which splits of the jobs?
>
> This happens exactly the same way hadoop's mapreduce figures out data
locality. Since we support hadoop's inputformats(which also has the
information on how data is partitioned) etc. So having spark workers share
the same nodes as your DFS is a good idea.


> Anyway, my bigger question remains: YARN or standalone? Which is the more
> stable option currently? Which is the more future-proof option?
>
>
Well I think standalone is stable enough for all purposes and Spark's yarn
support has been keeping up with latest hadoop versions too. It depends on
the fact that if you are already using yarn and don't want the hassle of
setting up another cluster manager you can probably prefer yarn.


> Thanks,
> Ishaaq
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/standalone-vs-YARN-tp4271.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: standalone vs YARN

Reply via email to