Re: Job initialization performance of Spark standalone mode vs YARN

2014-04-04 Thread Ron Gonzalez
Hi, Can you explain a little more what's going on? Which one submits a job to the yarn cluster that creates an application master and spawns containers for the local jobs? I tried yarn-client and submitted to our yarn cluster and it seems to work that way. Shouldn't Client.scala be running wi

Re: Job initialization performance of Spark standalone mode vs YARN

2014-04-03 Thread Kevin Markey
We are now testing precisely what you ask about in our environment.  But Sandy's questions are relevant.  The bigger issue is not Spark vs. Yarn but "client" vs. "standalone" and where the client is located on the network relative to the cluster. The "client" options

Re: Job initialization performance of Spark standalone mode vs YARN

2014-03-06 Thread Mayur Rustagi
Would you be the best person in the world & share some code. Its a pretty common problem . On Mar 6, 2014 6:36 PM, "polkosity" wrote: > We're not using Ooyala's job server. We are holding the spark context for > reuse within our own REST server (with a service to run each job). > > Our low-laten

Re: Job initialization performance of Spark standalone mode vs YARN

2014-03-06 Thread polkosity
We're not using Ooyala's job server. We are holding the spark context for reuse within our own REST server (with a service to run each job). Our low-latency job now reads all its data from a memory cached RDD, instead of from HDFS seq file (upstream jobs cache resultant RDDs for downstream jobs t

Re: Job initialization performance of Spark standalone mode vs YARN

2014-03-06 Thread Mayur Rustagi
are you using job server or just reusing spark context? Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi On Wed, Mar 5, 2014 at 10:30 PM, polkosity wrote: > After changing to reuse spark context and cache RDDs in memory, pe

Re: Job initialization performance of Spark standalone mode vs YARN

2014-03-05 Thread polkosity
After changing to reuse spark context and cache RDDs in memory, performance is 4 times better. We didn't expect that much of an improvement! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Job-initialization-performance-of-Spark-standalone-mode-vs-YARN-tp20

Re: Job initialization performance of Spark standalone mode vs YARN

2014-03-03 Thread Koert Kuipers
to be more precise, the difference depends on de-serialization overhead from kryo for your data structures. On Mon, Mar 3, 2014 at 8:21 PM, Koert Kuipers wrote: > yes, tachyon is in memory serialized, which is not as fast as cached in > memory in spark (not serialized). the difference really de

Re: Job initialization performance of Spark standalone mode vs YARN

2014-03-03 Thread Koert Kuipers
yes, tachyon is in memory serialized, which is not as fast as cached in memory in spark (not serialized). the difference really depends on your job type. On Mon, Mar 3, 2014 at 7:10 PM, polkosity wrote: > Thats exciting! Will be looking into that, thanks Andrew. > > Related topic, has anyone

Re: Job initialization performance of Spark standalone mode vs YARN

2014-03-03 Thread Mayur Rustagi
+1 Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi On Mon, Mar 3, 2014 at 4:10 PM, polkosity wrote: > Thats exciting! Will be looking into that, thanks Andrew. > > Related topic, has anyone had any experience running Spa

Re: Job initialization performance of Spark standalone mode vs YARN

2014-03-03 Thread polkosity
Thats exciting! Will be looking into that, thanks Andrew. Related topic, has anyone had any experience running Spark on Tachyon in-memory filesystem, and could offer their views on using it? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Job-initializati

Re: Job initialization performance of Spark standalone mode vs YARN

2014-03-03 Thread Andrew Ash
polkosity, have you seen the job server that Ooyala open sourced? I think it's very similar to what you're proposing with a REST API and re-using a SparkContext. https://github.com/apache/incubator-spark/pull/222 http://engineering.ooyala.com/blog/open-sourcing-our-spark-job-server On Mon, Mar

Re: Job initialization performance of Spark standalone mode vs YARN

2014-03-03 Thread polkosity
We're thinking of creating a Spark job server with a REST API, which would enable us (as well as managing jobs) to re-use the spark context as you suggest. Thanks Koert! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Job-initialization-performance-of-Spark

Re: Job initialization performance of Spark standalone mode vs YARN

2014-03-03 Thread Sandy Ryza
Are you running in yarn-standalone mode or yarn-client mode? Also, what YARN scheduler and what NodeManager heartbeat? On Sun, Mar 2, 2014 at 9:41 PM, polkosity wrote: > Thanks for the advice Mayur. > > I thought I'd report back on the performance difference... Spark > standalone > mode has e

Re: Job initialization performance of Spark standalone mode vs YARN

2014-03-03 Thread Koert Kuipers
If you need quick response re-use your spark context between queries and cache rdds in memory On Mar 3, 2014 12:42 AM, "polkosity" wrote: > Thanks for the advice Mayur. > > I thought I'd report back on the performance difference... Spark > standalone > mode has executors processing at capacity i

Re: Job initialization performance of Spark standalone mode vs YARN

2014-03-02 Thread polkosity
Thanks for the advice Mayur. I thought I'd report back on the performance difference... Spark standalone mode has executors processing at capacity in under a second :) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Job-initialization-performance-of-Spark-

Re: Job initialization performance of Spark standalone mode vs YARN

2014-02-24 Thread Mayur Rustagi
Mayur Rustagi Ph: +919632149971 h ttp://www.sigmoidanalytics.com https://twitter.com/mayur_rustagi On Mon, Feb 24, 2014 at 10:22 PM, polkosity wrote: > Is there any difference in the performance of Spark standalone mode and > YARN > when it comes to initializ