Nope, actually, they didn't find that (they found some other things that were fixed, as well as some improvements). Feel free to send a PR, but it would be good to profile the issue first to understand what slowed down. (For example is the map phase taking longer or is it the reduce phase, is there some difference in lengths of specific tasks, etc).
Matei On September 1, 2014 at 10:03:20 PM, Nicholas Chammas (nicholas.cham...@gmail.com) wrote: Oh, that's sweet. So, a related question then. Did those tests pick up the performance issue reported in SPARK-3333? Does it make sense to add a new test to cover that case? On Tue, Sep 2, 2014 at 12:29 AM, Matei Zaharia <matei.zaha...@gmail.com> wrote: Hi Nicholas, At Databricks we already run https://github.com/databricks/spark-perf for each release, which is a more comprehensive performance test suite. Matei On September 1, 2014 at 8:22:05 PM, Nicholas Chammas (nicholas.cham...@gmail.com) wrote: What do people think of running the Big Data Benchmark <https://amplab.cs.berkeley.edu/benchmark/> (repo <https://github.com/amplab/benchmark>) as part of preparing every new release of Spark? We'd run it just for Spark and effectively use it as another type of test to track any performance progress or regressions from release to release. Would doing such a thing be valuable? Do we already have a way of benchmarking Spark performance that we use regularly? Nick