subject:"Re\: Surprising Spark SQL benchmark"

Re: Surprising Spark SQL benchmark

2014-11-05 Thread Matei Zaharia

Ferrari and somehow forgetting to >> mention that the last record was held by a 2001 Toyota Celica. >> >> - Steve >> >> >> From: Nicholas Chammas >> Date: Wednesday, November 5, 2014 at 15:56 >> To: Steve Nunez >> Cc: Patrick Wende

Re: Surprising Spark SQL benchmark

2014-11-05 Thread Nicholas Chammas

ecord >> at the Nürburgring in a 2014 1000hp LaFerrari and somehow forgetting to >> mention that the last record was held by a 2001 Toyota Celica. >> >> - Steve >> >> >> From: Nicholas Chammas >> Date: Wednesday, November 5, 2014 at 15:56 >> To: Stev

Re: Surprising Spark SQL benchmark

2014-11-05 Thread Reynold Xin

gt; Date: Wednesday, November 5, 2014 at 15:56 > To: Steve Nunez > Cc: Patrick Wendell , dev > Subject: Re: Surprising Spark SQL benchmark > > > Steve Nunez, I believe the information behind the links below should > address > > your concerns earlier about Databricks'

Re: Surprising Spark SQL benchmark

2014-11-05 Thread Steve Nunez

world record at the Nürburgring in a 2014 1000hp LaFerrari and somehow forgetting to mention that the last record was held by a 2001 Toyota Celica. - Steve From: Nicholas Chammas Date: Wednesday, November 5, 2014 at 15:56 To: Steve Nunez Cc: Patrick Wendell , dev Subject: Re: Surprising Spark

Re: Surprising Spark SQL benchmark

2014-11-05 Thread Nicholas Chammas

Steve Nunez, I believe the information behind the links below should address your concerns earlier about Databricks's submission to the Daytona Gray benchmark. On Wed, Nov 5, 2014 at 6:43 PM, Nicholas Chammas wrote: > On Fri, Oct 31, 2014 at 3:45 PM, Nicholas Chammas < > nicholas.cham...@gmail.c

Re: Surprising Spark SQL benchmark

2014-11-05 Thread Nicholas Chammas

On Fri, Oct 31, 2014 at 3:45 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: I believe that benchmark has a pending certification on it. See > http://sortbenchmark.org under "Process". > Regarding this comment, Reynold has just announced that this benchmark is now certified. - Announ

Re: Surprising Spark SQL benchmark

2014-11-05 Thread Marco Slot

Hi Patrick, We left the details of the configuration of Spark that we used out of the blog post for brevity, but we're happy to share them. We've done quite a bit of tuning to find the configuration settings that gave us the best query times and run the most queries. I think there might still be a

Re: Surprising Spark SQL benchmark

2014-11-04 Thread Michael Armbrust

dev to bcc. Thanks for reaching out, Ozgun. Let's discuss if there were any missing optimizations off list. We'll make sure to report back or add any findings to the tuning guide. On Mon, Nov 3, 2014 at 3:01 PM, ozgun wrote: > Hey Patrick, > > It's Ozgun from Citus Data. We'd like to make the

Re: Surprising Spark SQL benchmark

2014-11-03 Thread ozgun

Hey Patrick, It's Ozgun from Citus Data. We'd like to make these benchmark results fair, and have tried different config settings for SparkSQL over the past month. We picked the best config settings we could find, and also contacted the Spark users list about running TPC-H numbers. http://goo.gl/

Re: Surprising Spark SQL benchmark

2014-11-01 Thread Kay Ousterhout

Hi Nick, No -- we're doing a much more constrained thing of just trying to get things set up to easily run TPC-DS on SparkSQL (which involves generating the data, storing it in HDFS, getting all the queries in the right format, etc.). Cloudera does have a repo here: https://github.com/cloudera/imp

Re: Surprising Spark SQL benchmark

2014-11-01 Thread Nicholas Chammas

Kay, Is this effort related to the existing AMPLab Big Data benchmark that covers Spark, Redshift, Tez, and Impala? Nick 2014년 10월 31일 금요일, Kay Ousterhout님이 작성한 메시지: > There's been an effort in the AMPLab at Berkeley to set up a shared > codebase that makes it easy to run TPC-DS on SparkSQL, s

Re: Surprising Spark SQL benchmark

2014-11-01 Thread Nicholas Chammas

Good points raised. Some comments. Re: #1 It seems like there is a misunderstanding of the purpose of the Daytona Gray benchmark. The purpose of the benchmark is to see how fast you can sort 100 TB of data (technically, your sort rate during the operation) using *any* hardware or software config,

Re: Surprising Spark SQL benchmark

2014-11-01 Thread RJ Nowling

Two thoughts here: 1. The real flaw with the sort benchmark was that Hadoop wasn't run on the same hardware. Given the advances in networking (availabIlity of 10GB Ethernet) and disks (SSDs) since the Hadoop benchmarks it was compared to, it's an apples to oranges comparison. Without that, it does

Re: Surprising Spark SQL benchmark

2014-11-01 Thread arthur.hk.c...@gmail.com

Hi Key, Thank you so much for your update!! Look forward to the shared code from AMPLab. As a member of the Spark community, I really hope that I could help to run TPC-DS on SparkSQL. At the moment, I am trying TPC-H 22 queries on SparkSQL 1.1.0 +Hive 0.12, and Hive 0.13.1 respectively (waiti

Re: Surprising Spark SQL benchmark

2014-10-31 Thread Kay Ousterhout

There's been an effort in the AMPLab at Berkeley to set up a shared codebase that makes it easy to run TPC-DS on SparkSQL, since it's something we do frequently in the lab to evaluate new research. Based on this thread, it sounds like making this more widely-available is something that would be us

Re: Surprising Spark SQL benchmark

2014-10-31 Thread Nicholas Chammas

I believe that benchmark has a pending certification on it. See http://sortbenchmark.org under "Process". It's true they did not share enough details on the blog for readers to reproduce the benchmark, but they will have to share enough with the committee behind the benchmark in order to be certif

Re: Surprising Spark SQL benchmark

2014-10-31 Thread Steve Nunez

To be fair, we (Spark community) haven’t been any better, for example this benchmark: https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html For which no details or code have been released to allow others to reproduce it. I would encourage anyone doing a Spark benchmark in futur

Re: Surprising Spark SQL benchmark

2014-10-31 Thread Nicholas Chammas

Thanks for the response, Patrick. I guess the key takeaways are 1) the tuning/config details are everything (they're not laid out here), 2) the benchmark should be reproducible (it's not), and 3) reach out to the relevant devs before publishing (didn't happen). Probably key takeaways for any kind

Re: Surprising Spark SQL benchmark

2014-10-31 Thread Patrick Wendell

Hey Nick, Unfortunately Citus Data didn't contact any of the Spark or Spark SQL developers when running this. It is really easy to make one system look better than others when you are running a benchmark yourself because tuning and sizing can lead to a 10X performance improvement. This benchmark d

Re: Surprising Spark SQL benchmark

Re: Surprising Spark SQL benchmark

Re: Surprising Spark SQL benchmark

Re: Surprising Spark SQL benchmark

Re: Surprising Spark SQL benchmark

Re: Surprising Spark SQL benchmark

Re: Surprising Spark SQL benchmark

Re: Surprising Spark SQL benchmark

Re: Surprising Spark SQL benchmark

Re: Surprising Spark SQL benchmark

Re: Surprising Spark SQL benchmark

Re: Surprising Spark SQL benchmark

Re: Surprising Spark SQL benchmark

Re: Surprising Spark SQL benchmark

Re: Surprising Spark SQL benchmark

Re: Surprising Spark SQL benchmark

Re: Surprising Spark SQL benchmark

Re: Surprising Spark SQL benchmark

Re: Surprising Spark SQL benchmark

19 matches

Site Navigation

Mail list logo

Footer information