Re: Replacing Spark's native scheduler with Sparrow

2014-11-10 Thread Tathagata Das
Too bad Nick, I dont have anything immediately ready that tests Spark Streaming with those extreme settings. :) On Mon, Nov 10, 2014 at 9:56 AM, Nicholas Chammas wrote: > On Sun, Nov 9, 2014 at 1:51 AM, Tathagata Das > wrote: >> >> This causes a scalability vs. latency tradeoff - if your limit i

Re: Replacing Spark's native scheduler with Sparrow

2014-11-10 Thread Nicholas Chammas
On Sun, Nov 9, 2014 at 1:51 AM, Tathagata Das wrote: > This causes a scalability vs. latency tradeoff - if your limit is 1000 > tasks per second (simplifying from 1500), you could either configure > it to use 100 receivers at 100 ms batches (10 blocks/sec), or 1000 > receivers at 1 second batches

Re: Replacing Spark's native scheduler with Sparrow

2014-11-08 Thread Tathagata Das
Let me chime in on the discussion as well. Spark Streaming is another usecase where the scheduler's task-launching throughput and task-latency can limit the batch interval and the overall latencies achievable by Spark Streaming. Lets say we want to do batches of 20 ms (for achieve end-to-end latenc

Re: Replacing Spark's native scheduler with Sparrow

2014-11-08 Thread Michael Armbrust
> > However, I haven't seen it be as > high as the 100ms Michael quoted (maybe this was for jobs with tasks that > have much larger objects that take a long time to deserialize?). > I was thinking more about the average end-to-end latency for launching a query that has 100s of partitions. Its also

Re: Replacing Spark's native scheduler with Sparrow

2014-11-07 Thread Nicholas Chammas
Yeah, perhaps I misunderstood what Michael was saying. But thanks for pointing out the relevant UI functionality. On Sat, Nov 8, 2014 at 1:22 AM, Kay Ousterhout wrote: > I don't have much more info than what Shivaram said. My sense is that, > over time, task launch overhead with Spark has slowl

Re: Replacing Spark's native scheduler with Sparrow

2014-11-07 Thread Kay Ousterhout
I don't have much more info than what Shivaram said. My sense is that, over time, task launch overhead with Spark has slowly grown as Spark supports more and more functionality. However, I haven't seen it be as high as the 100ms Michael quoted (maybe this was for jobs with tasks that have much la

Re: Replacing Spark's native scheduler with Sparrow

2014-11-07 Thread Shivaram Venkataraman
I think Kay might be able to give a better answer. The most recent benchmark I remember had the number at at somewhere between 8.6ms and 14.6ms depending on the Spark version ( https://github.com/apache/spark/pull/2030#issuecomment-52715181). Another point to note is that this is the total time to

Re: Replacing Spark's native scheduler with Sparrow

2014-11-07 Thread Nicholas Chammas
Hmm, relevant quote from section 3.3: newer frameworks like Spark [35] reduce the overhead to 5ms. To support > tasks that complete in hundreds of mil- liseconds, we argue for reducing > task launch overhead even further to 1ms so that launch overhead > constitutes at most 1% of task runtime. By m

Re: Replacing Spark's native scheduler with Sparrow

2014-11-07 Thread Shivaram Venkataraman
On Fri, Nov 7, 2014 at 8:04 PM, Nicholas Chammas wrote: > Sounds good. I'm looking forward to tracking improvements in this area. > > Also, just to connect some more dots here, I just remembered that there is > currently an initiative to add an IndexedRDD >

Re: Replacing Spark's native scheduler with Sparrow

2014-11-07 Thread Nicholas Chammas
Sounds good. I'm looking forward to tracking improvements in this area. Also, just to connect some more dots here, I just remembered that there is currently an initiative to add an IndexedRDD interface. Some interesting use cases mentioned there i

Re: Replacing Spark's native scheduler with Sparrow

2014-11-07 Thread Kay Ousterhout
On Fri, Nov 7, 2014 at 6:20 PM, Nicholas Chammas wrote: > If, for example, you have a cluster of 100 machines, this means the >> scheduler can launch 150 tasks per machine per second. > > > Did you mean 15 tasks per machine per second here? Or alternatively, 10 > machines? > Yes -- sorry for the

Re: Replacing Spark's native scheduler with Sparrow

2014-11-07 Thread Nicholas Chammas
> > If, for example, you have a cluster of 100 machines, this means the > scheduler can launch 150 tasks per machine per second. Did you mean 15 tasks per machine per second here? Or alternatively, 10 machines? I don't know of any existing Spark clusters that have a large enough number > of mach

Re: Replacing Spark's native scheduler with Sparrow

2014-11-07 Thread Kay Ousterhout
ed on much larger clusters, such that Sparrow will be necessary! -Kay On Fri, Nov 7, 2014 at 3:05 PM, Nicholas Chammas wrote: > I just watched Kay's talk from 2013 on Sparrow > <https://www.youtube.com/watch?v=ayjH_bG-RC0>. Is replacing Spark's native > scheduler with Sparrow

Replacing Spark's native scheduler with Sparrow

2014-11-07 Thread Nicholas Chammas
I just watched Kay's talk from 2013 on Sparrow <https://www.youtube.com/watch?v=ayjH_bG-RC0>. Is replacing Spark's native scheduler with Sparrow still on the books? The Sparrow repo <https://github.com/radlab/sparrow> hasn't been updated recently, and I don't s