Congrats to everyone who helped make this happen. And if anyone has even more machines they'd like us to run on next year, let us know :).
Matei > On Nov 5, 2014, at 3:11 PM, Reynold Xin <r...@databricks.com> wrote: > > Hi all, > > We are excited to announce that the benchmark entry has been reviewed by > the Sort Benchmark committee and Spark has officially won the Daytona > GraySort contest in sorting 100TB of data. > > Our entry tied with a UCSD research team building high performance systems > and we jointly set a new world record. This is an important milestone for > the project, as it validates the amount of engineering work put into Spark > by the community. > > As Matei said, "For an engine to scale from these multi-hour petabyte batch > jobs down to 100-millisecond streaming and interactive queries is quite > uncommon, and it's thanks to all of you folks that we are able to make this > happen." > > Updated blog post: > http://databricks.com/blog/2014/11/05/spark-officially-sets-a-new-record-in-large-scale-sorting.html > > > > > On Fri, Oct 10, 2014 at 7:54 AM, Matei Zaharia <matei.zaha...@gmail.com> > wrote: > >> Hi folks, >> >> I interrupt your regularly scheduled user / dev list to bring you some >> pretty cool news for the project, which is that we've been able to use >> Spark to break MapReduce's 100 TB and 1 PB sort records, sorting data 3x >> faster on 10x fewer nodes. There's a detailed writeup at >> http://databricks.com/blog/2014/10/10/spark-breaks-previous-large-scale-sort-record.html. >> Summary: while Hadoop MapReduce held last year's 100 TB world record by >> sorting 100 TB in 72 minutes on 2100 nodes, we sorted it in 23 minutes on >> 206 nodes; and we also scaled up to sort 1 PB in 234 minutes. >> >> I want to thank Reynold Xin for leading this effort over the past few >> weeks, along with Parviz Deyhim, Xiangrui Meng, Aaron Davidson and Ali >> Ghodsi. In addition, we'd really like to thank Amazon's EC2 team for >> providing the machines to make this possible. Finally, this result would of >> course not be possible without the many many other contributions, testing >> and feature requests from throughout the community. >> >> For an engine to scale from these multi-hour petabyte batch jobs down to >> 100-millisecond streaming and interactive queries is quite uncommon, and >> it's thanks to all of you folks that we are able to make this happen. >> >> Matei >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org