Hi, My understanding of Spark on YARN and even Spark in general is very limited so keep that in mind.
I'm not sure why you compare yarn-cluster and spark standalone? In yarn-cluster a driver runs on a node in the YARN cluster while spark standalone keeps the driver on the machine you launched a Spark application. Also, YARN cluster supports retrying applications while standalone doesn't. There's also support for rack locality preference (but dunno if that's used and where in Spark). My limited understanding suggests me to use Spark on YARN if you're considering to use Hadoop/HDFS and submitting jobs using YARN. Standalone's an entry option where throwing in YARN could kill introducing Spark to organizations without Hadoop YARN. Just my two cents. Pozdrawiam, Jacek -- Jacek Laskowski | https://medium.com/@jaceklaskowski/ | http://blog.jaceklaskowski.pl Mastering Spark https://jaceklaskowski.gitbooks.io/mastering-apache-spark/ Follow me at https://twitter.com/jaceklaskowski Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski On Fri, Nov 27, 2015 at 8:36 AM, cs user <acldstk...@gmail.com> wrote: > Hi All, > > Apologies if this question has been asked before. I'd like to know if there > are any downsides to running spark over yarn with the --master yarn-cluster > option vs having a separate spark standalone cluster to execute jobs? > > We're looking at installing a hdfs/hadoop cluster with Ambari and submitting > jobs to the cluster using yarn, or having an Ambari cluster and a separate > standalone spark cluster, which will run the spark jobs on data within hdfs. > > With yarn, will we still get all the benefits of spark? > > Will it be possible to process streaming data? > > Many thanks in advance for any responses. > > Cheers! --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org