Re: Run the "Big Data Benchmark" for new releases

Nicholas Chammas Mon, 01 Sep 2014 22:59:01 -0700

Alright, sounds good! I've created databricks/spark-perf/issues/9
<https://github.com/databricks/spark-perf/issues/9> as a reminder for us to
add a new test once we've root caused SPARK-3333.



On Tue, Sep 2, 2014 at 1:07 AM, Patrick Wendell <pwend...@gmail.com> wrote:

> Yeah, this wasn't detected in our performance tests. We even have a
> test in PySpark that I would have though might catch this (it just
> schedules a bunch of really small tasks, similar to the regression
> case).
>
>
> https://github.com/databricks/spark-perf/blob/master/pyspark-tests/tests.py#L51
>
> Anyways, Josh is trying to repro the regression to see if we can
> figure out what is going on. If we find something for sure we should
> add a test.
>
> On Mon, Sep 1, 2014 at 10:04 PM, Matei Zaharia <matei.zaha...@gmail.com>
> wrote:
> > Nope, actually, they didn't find that (they found some other things that
> were fixed, as well as some improvements). Feel free to send a PR, but it
> would be good to profile the issue first to understand what slowed down.
> (For example is the map phase taking longer or is it the reduce phase, is
> there some difference in lengths of specific tasks, etc).
> >
> > Matei
> >
> > On September 1, 2014 at 10:03:20 PM, Nicholas Chammas (
> nicholas.cham...@gmail.com) wrote:
> >
> > Oh, that's sweet. So, a related question then.
> >
> > Did those tests pick up the performance issue reported in SPARK-3333?
> Does it make sense to add a new test to cover that case?
> >
> >
> > On Tue, Sep 2, 2014 at 12:29 AM, Matei Zaharia <matei.zaha...@gmail.com>
> wrote:
> > Hi Nicholas,
> >
> > At Databricks we already run https://github.com/databricks/spark-perf
> for each release, which is a more comprehensive performance test suite.
> >
> > Matei
> >
> > On September 1, 2014 at 8:22:05 PM, Nicholas Chammas (
> nicholas.cham...@gmail.com) wrote:
> >
> > What do people think of running the Big Data Benchmark
> > <https://amplab.cs.berkeley.edu/benchmark/> (repo
> > <https://github.com/amplab/benchmark>) as part of preparing every new
> > release of Spark?
> >
> > We'd run it just for Spark and effectively use it as another type of test
> > to track any performance progress or regressions from release to release.
> >
> > Would doing such a thing be valuable? Do we already have a way of
> > benchmarking Spark performance that we use regularly?
> >
> > Nick
> >
>

Re: Run the "Big Data Benchmark" for new releases

Reply via email to