On May 10, 2018 5:56:39 PM GMT+02:00, Alexander Monakov <amona...@ispras.ru> 
wrote:
>Hello.
>
>This introduces a replacement for qsort() in GCC. The main selling
>point is
>reproducibility (currently compiler output may change depending on how
>libc
>qsort reorders not-bitwise-identical elements that compare equal) with
>a 
>small improvement speed-wise and small code growth (under 2K on
>x86-64).
>
>The opening comment in sort.cc gives a brief implementation overview:
>
>/* This implements a sort function suitable for GCC use cases:
>   - signature-compatible to C qsort, but relaxed contract:
>     - may apply the comparator to elements in a temporary buffer

What consequences has this or rather how is this observable and makes 
comparators behave differently? 

Otherwise thanks for doing this. Will review tomorrow. 

Richard. 

>     - may abort on allocation failure
>   - deterministic (but not necessarily stable)
>   - fast, especially for common cases (0-5 elements of size 8 or 4)
>
>   The implementation uses a network sort for up to 5 elements and
>  a merge sort on top of that.  Neither stage has branches depending on
>comparator result, trading extra arithmetic for branch mispredictions. 
>*/
>
>I used a Sandy Bridge CPU to collect statistics on tramp3d -O2
>compilation.
>
>Overall the new implementation is roughly 30% faster compared to Glibc
>qsort,
>with 2x or more speedup for cases with tiny element count. I see one
>instance
>where the new approach is significantly (1.5x) slower: it is ipa-icf.c:
>sort_congruence_class_groups_by_decl_uid. It sorts a big array (1500
>entries)
>and needs 14 indirect loads just to reach values to compare, so when
>branch
>prediction manages to guess correctly, it allows to overlap execution
>of two
>comparators and better hide their cache miss latency.
>
>Overall GCC spends about 0.3% time under qsort, but this doesn't
>automatically
>mean that this brings a 0.1% speed improvement: it may be larger or
>smaller
>depending on how new code affects cache behavior and branch predictors
>in
>other code, and it's not trivial to measure precisely.
>
>I can go into more detail about measured stats if there's interest :)
>
>Makefile.in changes are separated to patch 2 in the hope it'd make
>review
>easier, but the two patches will need to be applied together.
>
>Bootstrapped/regtested on x86-64, OK for trunk?
>
>Alexander Monakov (2):
>  gcc_qsort: source code changes
>  gcc_qsort: build system changes
>
> gcc/Makefile.in |   9 ++-
>gcc/sort.cc     | 232
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> gcc/system.h    |   7 +-
> gcc/vec.c       |   2 +-
> 4 files changed, 243 insertions(+), 7 deletions(-)
> create mode 100644 gcc/sort.cc

Reply via email to