Alright, sounds good! I've created databricks/spark-perf/issues/9 <https://github.com/databricks/spark-perf/issues/9> as a reminder for us to add a new test once we've root caused SPARK-3333.
On Tue, Sep 2, 2014 at 1:07 AM, Patrick Wendell <pwend...@gmail.com> wrote: > Yeah, this wasn't detected in our performance tests. We even have a > test in PySpark that I would have though might catch this (it just > schedules a bunch of really small tasks, similar to the regression > case). > > > https://github.com/databricks/spark-perf/blob/master/pyspark-tests/tests.py#L51 > > Anyways, Josh is trying to repro the regression to see if we can > figure out what is going on. If we find something for sure we should > add a test. > > On Mon, Sep 1, 2014 at 10:04 PM, Matei Zaharia <matei.zaha...@gmail.com> > wrote: > > Nope, actually, they didn't find that (they found some other things that > were fixed, as well as some improvements). Feel free to send a PR, but it > would be good to profile the issue first to understand what slowed down. > (For example is the map phase taking longer or is it the reduce phase, is > there some difference in lengths of specific tasks, etc). > > > > Matei > > > > On September 1, 2014 at 10:03:20 PM, Nicholas Chammas ( > nicholas.cham...@gmail.com) wrote: > > > > Oh, that's sweet. So, a related question then. > > > > Did those tests pick up the performance issue reported in SPARK-3333? > Does it make sense to add a new test to cover that case? > > > > > > On Tue, Sep 2, 2014 at 12:29 AM, Matei Zaharia <matei.zaha...@gmail.com> > wrote: > > Hi Nicholas, > > > > At Databricks we already run https://github.com/databricks/spark-perf > for each release, which is a more comprehensive performance test suite. > > > > Matei > > > > On September 1, 2014 at 8:22:05 PM, Nicholas Chammas ( > nicholas.cham...@gmail.com) wrote: > > > > What do people think of running the Big Data Benchmark > > <https://amplab.cs.berkeley.edu/benchmark/> (repo > > <https://github.com/amplab/benchmark>) as part of preparing every new > > release of Spark? > > > > We'd run it just for Spark and effectively use it as another type of test > > to track any performance progress or regressions from release to release. > > > > Would doing such a thing be valuable? Do we already have a way of > > benchmarking Spark performance that we use regularly? > > > > Nick > > >