On Thu, Apr 12, 2018 at 03:52:09PM +0200, Richard Biener wrote: > Not sure if I missed some important part of the discussion but > for the testcase we want to preserve the tailcall, right? So > it would be enough to set avoid_libcall to > endp != 0 && CALL_EXPR_TAILCALL (exp) (and thus also handle > stpcpy)?
For the testcase yes. There the question is if some targets have so lame mempcpy that using a tailcall to mempcpy is slower over avoiding the tailcall (and on aarch64 it looked like maintainer's choice to have lame mempcpy and hope the compiler will avoid it at all costs). On the other side, that change has been forced over to all targets, even when they don't have lame mempcpy. So, the tailcall is one issue, and we can either use mempcpy if endp and CALL_EXPR_TAILCALL, or only do that if -Os. And another issue is mempcpy uses in other contexts, here again I think x86 has good enough mempcpy that if I use foo (mempcpy (x, y, z)) then it is better to use mempcpy over memcpy call, but not so on targets with lame mempcpy. My preference would be to have non-lame mempcpy etc. on all targets, but the aarch64 folks disagree. So, wonder e.g. about Martin's patch, which would use mempcpy if endp and either FAST_SPEED for mempcpy (regardless of the context), or not SLOW_SPEED and CALL_EXPR_TAILCALL. That way, targets could signal they have so lame mempcpy that they never want to use it (return SLOW_SPEED), or ask for it to be used every time it makes sense from caller POV, and have the default something in between (only use it in tail calls). Jakub