On 11/25/2016 01:07 AM, Richard Biener wrote:

For the tail-call, issue should we artificially create a lhs and use that
as return value (perhaps by a separate pass before tailcall) ?

__builtin_memcpy (a1, a2, a3);
return a1;

gets transformed to:
_1 = __builtin_memcpy (a1, a2, a3)
return _1;

So tail-call optimization pass would see the IL in it's expected form.

As said, a RTL expert needs to chime in here.  Iff then tail-call
itself should do this rewrite.  But if this form is required to make
things work (I suppose you checked it _does_ actually work?) then
we'd need to make sure later passes do not undo it.  So it looks
fragile to me.  OTOH I seem to remember that the flags we set on
GIMPLE are merely a hint to RTL expansion and the tailcalling is
verified again there?
So tail calling actually sits on the border between trees and RTL. Essentially it's an expand-time decision as we use information from trees as well as low level target information.

I would not expect the former sequence to tail call. The tail calling code does not know that the return value from memcpy will be a1. Thus the tail calling code has to assume that it'll have to copy a1 into the return register after returning from memcpy, which obviously can't be done if we tail called memcpy.

The second form is much more likely to turn into a tail call sequence because the return value from memcpy will be sitting in the proper register. This form out to work for most calling conventions that allow tail calls.

We could (in theory) try and exploit the fact that memcpy returns its first argument as a return value, but that would only be helpful on a target where the first argument and return value use the same register. So I'd have a slight preference to rewriting per Prathamesh's suggestion above since it's more general.


Jeff

Reply via email to