Based on some of my performance work recently, I'm growing
uncomfortable with using gcc as the performance baseline since the
results can be significantly different (sometimes 3-4x or more on
certain fast algorithms) from clang and MSVC. The perf results on
https://github.com/apache/arrow/pull/7506 were really surprising --
some benchmarks that showed 2-5x performance improvement on both clang
and MSVC shows small regressions (20-30%) with gcc.

I don't think we need a hard-and-fast rule about whether to accept PRs
based on benchmarks but there are a few guiding criteria:

* How much binary size does the new code add? I think many of us would
agree that a 20% performance increase on some algorithm might not be
worth adding 500KB to libarrow.so
* Is the code generally faster across the major compiler targets (gcc,
clang, MSVC)?

I think that using clang as a baseline for informational benchmarks
would be good, but ultimately we need to be systematically collecting
data on all the major compiilers. Some time ago I proposed building a
Continuous Benchmarking framework
(https://github.com/conbench/conbench/blob/master/doc/REQUIREMENTS.md)
for use with Arrow (and outside of Arrow, too) so I hope that this
will be able to help.

- Wes

On Mon, Jun 22, 2020 at 5:12 AM Yibo Cai <yibo....@arm.com> wrote:
>
> On 6/22/20 5:07 PM, Antoine Pitrou wrote:
> >
> > Le 22/06/2020 à 06:27, Micah Kornfield a écrit :
> >> There has been significant effort recently trying to optimize our C++
> >> code.  One  thing that seems to come up frequently is different benchmark
> >> results between GCC and Clang.  Even different versions of the same
> >> compiler can yield significantly different results on the same code.
> >>
> >> I would like to propose that we choose a specific compiler and version on
> >> Linux for evaluating performance related PRs.  PRs would only be accepted
> >> if they improve the benchmarks under the selected version.
> >
> > Would this be a hard rule or just a guideline?  There are many ways in
> > which benchmark numbers can be improved or deteriorated by a PR, and in
> > some cases that doesn't matter (benchmarks are not always realistic, and
> > they are not representative of every workload).
> >
>
> I agree that microbenchmark is not always useful, focusing too much on
> improving microbenchmark result gives me feeling of "overfit" (to some
> specific microarchitecture, compiler, or use case).
>
> > Regards
> >
> > Antoine.
> >

Reply via email to