On May 10, 2018 5:56:39 PM GMT+02:00, Alexander Monakov <amona...@ispras.ru> wrote: >Hello. > >This introduces a replacement for qsort() in GCC. The main selling >point is >reproducibility (currently compiler output may change depending on how >libc >qsort reorders not-bitwise-identical elements that compare equal) with >a >small improvement speed-wise and small code growth (under 2K on >x86-64). > >The opening comment in sort.cc gives a brief implementation overview: > >/* This implements a sort function suitable for GCC use cases: > - signature-compatible to C qsort, but relaxed contract: > - may apply the comparator to elements in a temporary buffer
What consequences has this or rather how is this observable and makes comparators behave differently? Otherwise thanks for doing this. Will review tomorrow. Richard. > - may abort on allocation failure > - deterministic (but not necessarily stable) > - fast, especially for common cases (0-5 elements of size 8 or 4) > > The implementation uses a network sort for up to 5 elements and > a merge sort on top of that. Neither stage has branches depending on >comparator result, trading extra arithmetic for branch mispredictions. >*/ > >I used a Sandy Bridge CPU to collect statistics on tramp3d -O2 >compilation. > >Overall the new implementation is roughly 30% faster compared to Glibc >qsort, >with 2x or more speedup for cases with tiny element count. I see one >instance >where the new approach is significantly (1.5x) slower: it is ipa-icf.c: >sort_congruence_class_groups_by_decl_uid. It sorts a big array (1500 >entries) >and needs 14 indirect loads just to reach values to compare, so when >branch >prediction manages to guess correctly, it allows to overlap execution >of two >comparators and better hide their cache miss latency. > >Overall GCC spends about 0.3% time under qsort, but this doesn't >automatically >mean that this brings a 0.1% speed improvement: it may be larger or >smaller >depending on how new code affects cache behavior and branch predictors >in >other code, and it's not trivial to measure precisely. > >I can go into more detail about measured stats if there's interest :) > >Makefile.in changes are separated to patch 2 in the hope it'd make >review >easier, but the two patches will need to be applied together. > >Bootstrapped/regtested on x86-64, OK for trunk? > >Alexander Monakov (2): > gcc_qsort: source code changes > gcc_qsort: build system changes > > gcc/Makefile.in | 9 ++- >gcc/sort.cc | 232 >++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > gcc/system.h | 7 +- > gcc/vec.c | 2 +- > 4 files changed, 243 insertions(+), 7 deletions(-) > create mode 100644 gcc/sort.cc