Hello.

This introduces a replacement for qsort() in GCC. The main selling point is
reproducibility (currently compiler output may change depending on how libc
qsort reorders not-bitwise-identical elements that compare equal) with a 
small improvement speed-wise and small code growth (under 2K on x86-64).

The opening comment in sort.cc gives a brief implementation overview:

/* This implements a sort function suitable for GCC use cases:
   - signature-compatible to C qsort, but relaxed contract:
     - may apply the comparator to elements in a temporary buffer
     - may abort on allocation failure
   - deterministic (but not necessarily stable)
   - fast, especially for common cases (0-5 elements of size 8 or 4)

   The implementation uses a network sort for up to 5 elements and
   a merge sort on top of that.  Neither stage has branches depending on
   comparator result, trading extra arithmetic for branch mispredictions.  */

I used a Sandy Bridge CPU to collect statistics on tramp3d -O2 compilation.

Overall the new implementation is roughly 30% faster compared to Glibc qsort,
with 2x or more speedup for cases with tiny element count. I see one instance
where the new approach is significantly (1.5x) slower: it is ipa-icf.c:
sort_congruence_class_groups_by_decl_uid. It sorts a big array (1500 entries)
and needs 14 indirect loads just to reach values to compare, so when branch
prediction manages to guess correctly, it allows to overlap execution of two
comparators and better hide their cache miss latency.

Overall GCC spends about 0.3% time under qsort, but this doesn't automatically
mean that this brings a 0.1% speed improvement: it may be larger or smaller
depending on how new code affects cache behavior and branch predictors in
other code, and it's not trivial to measure precisely.

I can go into more detail about measured stats if there's interest :)

Makefile.in changes are separated to patch 2 in the hope it'd make review
easier, but the two patches will need to be applied together.

Bootstrapped/regtested on x86-64, OK for trunk?

Alexander Monakov (2):
  gcc_qsort: source code changes
  gcc_qsort: build system changes

 gcc/Makefile.in |   9 ++-
 gcc/sort.cc     | 232 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 gcc/system.h    |   7 +-
 gcc/vec.c       |   2 +-
 4 files changed, 243 insertions(+), 7 deletions(-)
 create mode 100644 gcc/sort.cc

-- 
2.13.3

Reply via email to