Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Ilya Ganelin
Hi Matei - I read your post with great interest. Could you possibly comment in more depth on some of the issues you guys saw when scaling up spark and how you resolved them? I am interested specifically in spark-related problems. I'm working on scaling up spark to very large datasets and have been

Re: How to do broadcast join in SparkSQL

2014-10-10 Thread Jianshi Huang
It works fine, thanks for the help Michael. Liancheng also told me a trick, using a subquery with LIMIT n. It works in latest 1.2.0 BTW, looks like the broadcast optimization won't be recognized if I do a left join instead of a inner join. Is that true? How can I make it work for left joins? Che

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Henry Saputra
Congrats to Reynold et al leading this effort! - Henry On Fri, Oct 10, 2014 at 7:54 AM, Matei Zaharia wrote: > Hi folks, > > I interrupt your regularly scheduled user / dev list to bring you some pretty > cool news for the project, which is that we've been able to use Spark to > break MapReduc

Re: new jenkins update + tentative release date

2014-10-10 Thread shane knapp
reminder: this IS happening, first thing monday morning PDT. :) On Wed, Oct 8, 2014 at 3:01 PM, shane knapp wrote: > greetings! > > i've got some updates regarding our new jenkins infrastructure, as well as > the initial date and plan for rolling things out: > > *** current testing/build break

Re: spark-prs and mesos/spark-ec2

2014-10-10 Thread Josh Rosen
I think this would require fairly significant refactoring of the PR board code.  I’d love it if the PR board code was more easily configurable to support different JIRA / GitHub repositories, etc, but I don’t have the time to work on this myself. - Josh On October 9, 2014 at 6:20:12 PM, Nichol

Re: Trouble running tests

2014-10-10 Thread Nicholas Chammas
Running dev/run-tests as-is should work and will test everything. That's what the contributing guide recommends, if I remember correctly. At some point we should make it easier to test individual components locally using the dev script, but calling sbt on the various tests suites as Michael pointe

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Steve Nunez
Great stuff. Wonderful to see such progress in so short a time. How about some links to code and instructions so that these benchmarks can be reproduced? Regards, - Steve From: Debasish Das Date: Friday, October 10, 2014 at 8:17 To: Matei Zaharia Cc: user , dev Subject: Re: Breaking the

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread arthur.hk.c...@gmail.com
Wonderful !! On 11 Oct, 2014, at 12:00 am, Nan Zhu wrote: > Great! Congratulations! > > -- > Nan Zhu > On Friday, October 10, 2014 at 11:19 AM, Mridul Muralidharan wrote: > >> Brilliant stuff ! Congrats all :-) >> This is indeed really heartening news ! >> >> Regards, >> Mridul >> >> >> On

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Nan Zhu
Great! Congratulations! -- Nan Zhu On Friday, October 10, 2014 at 11:19 AM, Mridul Muralidharan wrote: > Brilliant stuff ! Congrats all :-) > This is indeed really heartening news ! > > Regards, > Mridul > > > On Fri, Oct 10, 2014 at 8:24 PM, Matei Zaharia (mailto:matei.zaha...@gmail.com)

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Dinesh J. Weerakkody
Wow.. Cool.. Congratulations.. :) On Fri, Oct 10, 2014 at 8:51 PM, Ted Malaska wrote: > This is a bad deal, great job. > > On Fri, Oct 10, 2014 at 11:19 AM, Mridul Muralidharan > wrote: > > > Brilliant stuff ! Congrats all :-) > > This is indeed really heartening news ! > > > > Regards, > > Mri

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Ted Malaska
This is a bad deal, great job. On Fri, Oct 10, 2014 at 11:19 AM, Mridul Muralidharan wrote: > Brilliant stuff ! Congrats all :-) > This is indeed really heartening news ! > > Regards, > Mridul > > > On Fri, Oct 10, 2014 at 8:24 PM, Matei Zaharia > wrote: > > Hi folks, > > > > I interrupt your r

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Mridul Muralidharan
Brilliant stuff ! Congrats all :-) This is indeed really heartening news ! Regards, Mridul On Fri, Oct 10, 2014 at 8:24 PM, Matei Zaharia wrote: > Hi folks, > > I interrupt your regularly scheduled user / dev list to bring you some pretty > cool news for the project, which is that we've been a

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Debasish Das
Awesome news Matei ! Congratulations to the databricks team and all the community members... On Fri, Oct 10, 2014 at 7:54 AM, Matei Zaharia wrote: > Hi folks, > > I interrupt your regularly scheduled user / dev list to bring you some > pretty cool news for the project, which is that we've been

Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Matei Zaharia
Hi folks, I interrupt your regularly scheduled user / dev list to bring you some pretty cool news for the project, which is that we've been able to use Spark to break MapReduce's 100 TB and 1 PB sort records, sorting data 3x faster on 10x fewer nodes. There's a detailed writeup at http://datab