On 25 November 2016 at 13:37, Richard Biener <rguent...@suse.de> wrote:
> On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote:
>
>> On 24 November 2016 at 18:08, Richard Biener <rguent...@suse.de> wrote:
>> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
>> >
>> >> On 24 November 2016 at 17:48, Richard Biener <rguent...@suse.de> wrote:
>> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
>> >> >
>> >> >> On 24 November 2016 at 14:07, Richard Biener <rguent...@suse.de> wrote:
>> >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
>> >> >> >
>> >> >> >> Hi,
>> >> >> >> Consider following test-case:
>> >> >> >>
>> >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
>> >> >> >> {
>> >> >> >>   __builtin_memcpy (a1, a2, a3);
>> >> >> >>   return a1;
>> >> >> >> }
>> >> >> >>
>> >> >> >> return a1 can be considered equivalent to return value of memcpy,
>> >> >> >> and the call could be emitted as a tail-call.
>> >> >> >> gcc doesn't emit the above call to memcpy as a tail-call,
>> >> >> >> but if it is changed to:
>> >> >> >>
>> >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3);
>> >> >> >> return t1;
>> >> >> >>
>> >> >> >> Then memcpy is emitted as a tail-call.
>> >> >> >> The attached patch tries to handle the former case.
>> >> >> >>
>> >> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu.
>> >> >> >> Cross tested on arm*-*-*, aarch64*-*-*
>> >> >> >> Does this patch look OK ?
>> >> >> >
>> >> >> > +/* Return arg, if function returns it's argument or NULL if it 
>> >> >> > doesn't.
>> >> >> > */
>> >> >> > +tree
>> >> >> > +gimple_call_return_arg (gcall *call_stmt)
>> >> >> > +{
>> >> >> >
>> >> >> >
>> >> >> > Please just inline it at the single use - the name is not terribly
>> >> >> > informative.
>> >> >> >
>> >> >> > I'm not sure you can rely on code-generation working if you not
>> >> >> > effectively change the IL to
>> >> >> >
>> >> >> >   a1 = __builtin_memcpy (a1, a2, a3);
>> >> >> >   return a1;
>> >> >> >
>> >> >> > someone more familiar with RTL expansion plus tail call emission on
>> >> >> > RTL needs to chime in.
>> >> >> Well I was trying to copy-propagate function's argument into uses of
>> >> >> it's return value if
>> >> >> function returned that argument, so the assignment to lhs of call
>> >> >> could be made redundant.
>> >> >>
>> >> >> eg:
>> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
>> >> >> {
>> >> >>   void *t1 = __builtin_memcpy (a1, a2, a3);
>> >> >>   return t1;
>> >> >> }
>> >> >>
>> >> >> After patch, copyprop transformed it into:
>> >> >> t1 = __builtin_memcpy (a1, a2, a3);
>> >> >> return a1;
>> >> >
>> >> > But that's a bad transform -- if we know that t1 == a1 then it's
>> >> > better to use t1 as that's readily available in the return register
>> >> > while the register for a1 might have been clobbered and thus we
>> >> > need to spill it for the later return.
>> >> Oh I didn't realize this could possibly pessimize RA.
>> >> For test-case:
>> >>
>> >> void *t1 = memcpy (dest, src, n);
>> >> if (t1 != dest)
>> >>   __builtin_abort ();
>> >>
>> >> we could copy-propagate t1 into cond_expr and make the condition 
>> >> redundant.
>> >> However I suppose this particular case could be handled with VRP instead
>> >> (t1 and dest should be marked equivalent) ?
>> >
>> > Yeah, exposing this to value-numbering in general can enable some
>> > optimizations (but I wouldn't put it in copyprop).  Note it's then
>> > difficult to avoid copy-propgating things...
>> >
>> > The user can also write
>> >
>> > void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
>> > {
>> >   __builtin_memcpy (a1, a2, a3);
>> >   return a1;
>> > }
>> >
>> > so it's good to improve code-gen for that (for the tailcall issue).
>> For the tail-call, issue should we artificially create a lhs and use that
>> as return value (perhaps by a separate pass before tailcall) ?
>>
>> __builtin_memcpy (a1, a2, a3);
>> return a1;
>>
>> gets transformed to:
>> _1 = __builtin_memcpy (a1, a2, a3)
>> return _1;
>>
>> So tail-call optimization pass would see the IL in it's expected form.
>
> As said, a RTL expert needs to chime in here.  Iff then tail-call
> itself should do this rewrite.  But if this form is required to make
> things work (I suppose you checked it _does_ actually work?) then
> we'd need to make sure later passes do not undo it.  So it looks
> fragile to me.  OTOH I seem to remember that the flags we set on
> GIMPLE are merely a hint to RTL expansion and the tailcalling is
> verified again there?

Yeah, I verified the form works:
void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
{
  void *t1 = __builtin_memcpy (a1, a2, a3);
  return t1;
}

assembly:
f:
.LFB0:
        .cfi_startproc
        jmp     memcpy
        .cfi_endproc

Thanks,
Prathamesh
>
> Thanks,
> Richard.
>
>> Thanks,
>> Prathamesh
>> >
>> > Richard.
>> >
>> >> Thanks,
>> >> Prathamesh
>> >> >
>> >> >> But this now interferes with tail-call optimization, because it is not
>> >> >> able to emit memcpy
>> >> >> as tail-call anymore due to which the patch regressed 20050503-1.c.
>> >> >> I am not sure how to workaround this.
>> >> >>
>> >> >> Thanks,
>> >> >> Prathamesh
>> >> >> >
>> >> >> > Richard.
>> >> >>
>> >> >
>> >> > --
>> >> > Richard Biener <rguent...@suse.de>
>> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, 
>> >> > HRB 21284 (AG Nuernberg)
>> >>
>> >>
>> >
>> > --
>> > Richard Biener <rguent...@suse.de>
>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
>> > 21284 (AG Nuernberg)
>>
>>
>
> --
> Richard Biener <rguent...@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)

Reply via email to