Hi Nick, This hasn't yet been directly supported by Spark because of a lack of demand. The last time I ran a throughput test on the default Spark scheduler (~1 year ago, so this may have changed), it could launch approximately 1500 tasks / second. If, for example, you have a cluster of 100 machines, this means the scheduler can launch 150 tasks per machine per second. I don't know of any existing Spark clusters that have a large enough number of machines or short enough tasks to justify the added complexity of distributing the scheduler. Eventually I hope to see Spark used on much larger clusters, such that Sparrow will be necessary!
-Kay On Fri, Nov 7, 2014 at 3:05 PM, Nicholas Chammas <nicholas.cham...@gmail.com > wrote: > I just watched Kay's talk from 2013 on Sparrow > <https://www.youtube.com/watch?v=ayjH_bG-RC0>. Is replacing Spark's native > scheduler with Sparrow still on the books? > > The Sparrow repo <https://github.com/radlab/sparrow> hasn't been updated > recently, and I don't see any JIRA issues about it. > > It would be good to at least have a JIRA issue to track progress on this if > it's a long-term goal. > > Nick >