Re: Breaking the previous large-scale sort record with Spark

2014-11-05 Thread Matei Zaharia
Congrats to everyone who helped make this happen. And if anyone has even more machines they'd like us to run on next year, let us know :). Matei > On Nov 5, 2014, at 3:11 PM, Reynold Xin wrote: > > Hi all, > > We are excited to announce that the benchmark entry has been reviewed by > the Sort

Re: Breaking the previous large-scale sort record with Spark

2014-11-05 Thread Reynold Xin
Hi all, We are excited to announce that the benchmark entry has been reviewed by the Sort Benchmark committee and Spark has officially won the Daytona GraySort contest in sorting 100TB of data. Our entry tied with a UCSD research team building high performance systems and we jointly set a new wor

Re: Breaking the previous large-scale sort record with Spark

2014-10-13 Thread Krishna Sankar
Well done guys. MapReduce sort at that time was a good feat and Spark now has raised the bar with the ability to sort a PB. Like some of the folks in the list, a summary of what worked (and didn't) as well as the monitoring practices would be good. Cheers P.S: What are you folks planning next ? O

Re: Breaking the previous large-scale sort record with Spark

2014-10-13 Thread Ilya Ganelin
Thank you for the details! Would you mind speaking to what tools proved most useful as far as identifying bottlenecks or bugs? Thanks again. On Oct 13, 2014 5:36 PM, "Matei Zaharia" wrote: > The biggest scaling issue was supporting a large number of reduce tasks > efficiently, which the JIRAs in

Re: Breaking the previous large-scale sort record with Spark

2014-10-13 Thread Matei Zaharia
The biggest scaling issue was supporting a large number of reduce tasks efficiently, which the JIRAs in that post handle. In particular, our current default shuffle (the hash-based one) has each map task open a separate file output stream for each reduce task, which wastes a lot of memory (since

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Ilya Ganelin
Hi Matei - I read your post with great interest. Could you possibly comment in more depth on some of the issues you guys saw when scaling up spark and how you resolved them? I am interested specifically in spark-related problems. I'm working on scaling up spark to very large datasets and have been

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Henry Saputra
Congrats to Reynold et al leading this effort! - Henry On Fri, Oct 10, 2014 at 7:54 AM, Matei Zaharia wrote: > Hi folks, > > I interrupt your regularly scheduled user / dev list to bring you some pretty > cool news for the project, which is that we've been able to use Spark to > break MapReduc

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Steve Nunez
previous large-scale sort record with Spark > Awesome news Matei ! > > Congratulations to the databricks team and all the community members... > > On Fri, Oct 10, 2014 at 7:54 AM, Matei Zaharia > wrote: >> Hi folks, >> >> I interrupt your regularly schedul

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread arthur.hk.c...@gmail.com
Wonderful !! On 11 Oct, 2014, at 12:00 am, Nan Zhu wrote: > Great! Congratulations! > > -- > Nan Zhu > On Friday, October 10, 2014 at 11:19 AM, Mridul Muralidharan wrote: > >> Brilliant stuff ! Congrats all :-) >> This is indeed really heartening news ! >> >> Regards, >> Mridul >> >> >> On

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Nan Zhu
Great! Congratulations! -- Nan Zhu On Friday, October 10, 2014 at 11:19 AM, Mridul Muralidharan wrote: > Brilliant stuff ! Congrats all :-) > This is indeed really heartening news ! > > Regards, > Mridul > > > On Fri, Oct 10, 2014 at 8:24 PM, Matei Zaharia (mailto:matei.zaha...@gmail.com)

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Dinesh J. Weerakkody
Wow.. Cool.. Congratulations.. :) On Fri, Oct 10, 2014 at 8:51 PM, Ted Malaska wrote: > This is a bad deal, great job. > > On Fri, Oct 10, 2014 at 11:19 AM, Mridul Muralidharan > wrote: > > > Brilliant stuff ! Congrats all :-) > > This is indeed really heartening news ! > > > > Regards, > > Mri

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Ted Malaska
This is a bad deal, great job. On Fri, Oct 10, 2014 at 11:19 AM, Mridul Muralidharan wrote: > Brilliant stuff ! Congrats all :-) > This is indeed really heartening news ! > > Regards, > Mridul > > > On Fri, Oct 10, 2014 at 8:24 PM, Matei Zaharia > wrote: > > Hi folks, > > > > I interrupt your r

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Mridul Muralidharan
Brilliant stuff ! Congrats all :-) This is indeed really heartening news ! Regards, Mridul On Fri, Oct 10, 2014 at 8:24 PM, Matei Zaharia wrote: > Hi folks, > > I interrupt your regularly scheduled user / dev list to bring you some pretty > cool news for the project, which is that we've been a

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Debasish Das
Awesome news Matei ! Congratulations to the databricks team and all the community members... On Fri, Oct 10, 2014 at 7:54 AM, Matei Zaharia wrote: > Hi folks, > > I interrupt your regularly scheduled user / dev list to bring you some > pretty cool news for the project, which is that we've been