Ferrari and somehow forgetting to
>> mention that the last record was held by a 2001 Toyota Celica.
>>
>> - Steve
>>
>>
>> From: Nicholas Chammas
>> Date: Wednesday, November 5, 2014 at 15:56
>> To: Steve Nunez
>> Cc: Patrick Wende
ecord
>> at the Nürburgring in a 2014 1000hp LaFerrari and somehow forgetting to
>> mention that the last record was held by a 2001 Toyota Celica.
>>
>> - Steve
>>
>>
>> From: Nicholas Chammas
>> Date: Wednesday, November 5, 2014 at 15:56
>> To: Stev
gt; Date: Wednesday, November 5, 2014 at 15:56
> To: Steve Nunez
> Cc: Patrick Wendell , dev
> Subject: Re: Surprising Spark SQL benchmark
>
> > Steve Nunez, I believe the information behind the links below should
> address
> > your concerns earlier about Databricks'
world record
at the Nürburgring in a 2014 1000hp LaFerrari and somehow forgetting to
mention that the last record was held by a 2001 Toyota Celica.
- Steve
From: Nicholas Chammas
Date: Wednesday, November 5, 2014 at 15:56
To: Steve Nunez
Cc: Patrick Wendell , dev
Subject: Re: Surprising Spark
Steve Nunez, I believe the information behind the links below should
address your concerns earlier about Databricks's submission to the Daytona
Gray benchmark.
On Wed, Nov 5, 2014 at 6:43 PM, Nicholas Chammas wrote:
> On Fri, Oct 31, 2014 at 3:45 PM, Nicholas Chammas <
> nicholas.cham...@gmail.c
On Fri, Oct 31, 2014 at 3:45 PM, Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:
I believe that benchmark has a pending certification on it. See
> http://sortbenchmark.org under "Process".
>
Regarding this comment, Reynold has just announced that this benchmark is
now certified.
- Announ
Hi Patrick,
We left the details of the configuration of Spark that we used out of the
blog post for brevity, but we're happy to share them. We've done quite a
bit of tuning to find the configuration settings that gave us the best
query times and run the most queries. I think there might still be a
. If
> there are specific optimizations we should have applied and missed, we'd
> love to be involved with the community in re-running the numbers.
>
> Is this email thread the best place to continue the conversation?
>
> Best,
> Ozgun
>
>
>
> --
> View
imizations we should have applied and missed, we'd
love to be involved with the community in re-running the numbers.
Is this email thread the best place to continue the conversation?
Best,
Ozgun
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/
Hi Nick,
No -- we're doing a much more constrained thing of just trying to get
things set up to easily run TPC-DS on SparkSQL (which involves generating
the data, storing it in HDFS, getting all the queries in the right format,
etc.).
Cloudera does have a repo here: https://github.com/cloudera/imp
Kay,
Is this effort related to the existing AMPLab Big Data benchmark that
covers Spark, Redshift, Tez, and Impala?
Nick
2014년 10월 31일 금요일, Kay Ousterhout님이 작성한 메시지:
> There's been an effort in the AMPLab at Berkeley to set up a shared
> codebase that makes it easy to run TPC-DS on SparkSQL, s
Good points raised. Some comments.
Re: #1
It seems like there is a misunderstanding of the purpose of the Daytona
Gray benchmark. The purpose of the benchmark is to see how fast you can
sort 100 TB of data (technically, your sort rate during the operation)
using *any* hardware or software config,
Two thoughts here:
1. The real flaw with the sort benchmark was that Hadoop wasn't run on the
same hardware. Given the advances in networking (availabIlity of
10GB Ethernet) and disks (SSDs) since the Hadoop benchmarks it was compared
to, it's an apples to oranges comparison. Without that, it does
Hi Key,
Thank you so much for your update!!
Look forward to the shared code from AMPLab. As a member of the Spark
community, I really hope that I could help to run TPC-DS on SparkSQL. At the
moment, I am trying TPC-H 22 queries on SparkSQL 1.1.0 +Hive 0.12, and Hive
0.13.1 respectively (waiti
There's been an effort in the AMPLab at Berkeley to set up a shared
codebase that makes it easy to run TPC-DS on SparkSQL, since it's something
we do frequently in the lab to evaluate new research. Based on this
thread, it sounds like making this more widely-available is something that
would be us
I believe that benchmark has a pending certification on it. See
http://sortbenchmark.org under "Process".
It's true they did not share enough details on the blog for readers to
reproduce the benchmark, but they will have to share enough with the
committee behind the benchmark in order to be certif
To be fair, we (Spark community) haven’t been any better, for example this
benchmark:
https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html
For which no details or code have been released to allow others to
reproduce it. I would encourage anyone doing a Spark benchmark in futur
Thanks for the response, Patrick.
I guess the key takeaways are 1) the tuning/config details are everything
(they're not laid out here), 2) the benchmark should be reproducible (it's
not), and 3) reach out to the relevant devs before publishing (didn't
happen).
Probably key takeaways for any kind
Hey Nick,
Unfortunately Citus Data didn't contact any of the Spark or Spark SQL
developers when running this. It is really easy to make one system
look better than others when you are running a benchmark yourself
because tuning and sizing can lead to a 10X performance improvement.
This benchmark d
I know we don't want to be jumping at every benchmark someone posts out
there, but this one surprised me:
http://www.citusdata.com/blog/86-making-postgresql-scale-hadoop-style
This benchmark has Spark SQL failing to complete several queries in the
TPC-H benchmark. I don't understand much about th
20 matches
Mail list logo