Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

Wilco Dijkstra Thu, 12 Apr 2018 08:54:04 -0700

Jakub Jelinek wrote:
> On Thu, Apr 12, 2018 at 03:52:09PM +0200, Richard Biener wrote:
>> Not sure if I missed some important part of the discussion but
>> for the testcase we want to preserve the tailcall, right?  So
>> it would be enough to set avoid_libcall to
>> endp != 0 && CALL_EXPR_TAILCALL (exp) (and thus also handle
>> stpcpy)?


The tailcall issue is just a distraction. Historically the handling of mempcpy  
has been horribly inefficient in both GCC and GLIBC for practically all targets.
This is why it was decided to defer to memcpy.

For example small constant mempcpy was not expanded inline like memcpy
until PR70140 was fixed. Except for a few targets which have added an
optimized mempcpy, the default mempcpy implementation in almost all
released GLIBCs is much slower than memcpy (due to using a badly written
C implementation).

Recent GLIBCs now call the optimized memcpy - this is better but still adds
extra call/return overheads. So to improve that the GLIBC headers have an
inline that changes any call to mempcpy into memcpy (this is the default but
can be disabled on a per-target basis).

Obviously it is best to do this optimization in GCC, which is what we finally do
in GCC8. Inlining mempcpy means you sometimes miss a tailcall, but this is
not common - in all of GLIBC the inlining on AArch64 adds 166 extra instructions
and 12 callee-save registers. This is a small codesize cost to avoid the 
overhead
of calling the generic C version.

> My preference would be to have non-lame mempcpy etc. on all targets, but the
> aarch64 folks disagree.

The question is who is going to write the 30+ mempcpy implementations for all
those targets which don't have one? And who says doing this is actually going 
to 
improve performance? Having mempcpy+memcpy typically means more Icache
misses in code that uses both.

So generally it's a good idea to change mempcpy into memcpy by default. It's
not slower than calling mempcpy even if you have a fast implementation, it's 
faster
if you use an up to date GLIBC which calls memcpy, and it's significantly better
when using an old GLIBC.

Wilco

Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

Reply via email to