Re: Run the "Big Data Benchmark" for new releases

Matei Zaharia Mon, 01 Sep 2014 22:05:38 -0700

Nope, actually, they didn't find that (they found some other things that were 
fixed, as well as some improvements). Feel free to send a PR, but it would be 
good to profile the issue first to understand what slowed down. (For example is 
the map phase taking longer or is it the reduce phase, is there some difference 
in lengths of specific tasks, etc).


Matei

On September 1, 2014 at 10:03:20 PM, Nicholas Chammas 
(nicholas.cham...@gmail.com) wrote:

Oh, that's sweet. So, a related question then. 

Did those tests pick up the performance issue reported in SPARK-3333? Does it 
make sense to add a new test to cover that case?


On Tue, Sep 2, 2014 at 12:29 AM, Matei Zaharia <matei.zaha...@gmail.com> wrote:
Hi Nicholas,

At Databricks we already run https://github.com/databricks/spark-perf for each 
release, which is a more comprehensive performance test suite.

Matei

On September 1, 2014 at 8:22:05 PM, Nicholas Chammas 
(nicholas.cham...@gmail.com) wrote:

What do people think of running the Big Data Benchmark
<https://amplab.cs.berkeley.edu/benchmark/> (repo
<https://github.com/amplab/benchmark>) as part of preparing every new
release of Spark?

We'd run it just for Spark and effectively use it as another type of test
to track any performance progress or regressions from release to release.

Would doing such a thing be valuable? Do we already have a way of
benchmarking Spark performance that we use regularly?

Nick

Re: Run the "Big Data Benchmark" for new releases

Reply via email to