On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote: > On 25 November 2016 at 21:17, Jeff Law <l...@redhat.com> wrote: > > On 11/25/2016 01:07 AM, Richard Biener wrote: > > > >>> For the tail-call, issue should we artificially create a lhs and use that > >>> as return value (perhaps by a separate pass before tailcall) ? > >>> > >>> __builtin_memcpy (a1, a2, a3); > >>> return a1; > >>> > >>> gets transformed to: > >>> _1 = __builtin_memcpy (a1, a2, a3) > >>> return _1; > >>> > >>> So tail-call optimization pass would see the IL in it's expected form. > >> > >> > >> As said, a RTL expert needs to chime in here. Iff then tail-call > >> itself should do this rewrite. But if this form is required to make > >> things work (I suppose you checked it _does_ actually work?) then > >> we'd need to make sure later passes do not undo it. So it looks > >> fragile to me. OTOH I seem to remember that the flags we set on > >> GIMPLE are merely a hint to RTL expansion and the tailcalling is > >> verified again there? > > > > So tail calling actually sits on the border between trees and RTL. > > Essentially it's an expand-time decision as we use information from trees as > > well as low level target information. > > > > I would not expect the former sequence to tail call. The tail calling code > > does not know that the return value from memcpy will be a1. Thus the tail > > calling code has to assume that it'll have to copy a1 into the return > > register after returning from memcpy, which obviously can't be done if we > > tail called memcpy. > > > > The second form is much more likely to turn into a tail call sequence > > because the return value from memcpy will be sitting in the proper register. > > This form out to work for most calling conventions that allow tail calls. > > > > We could (in theory) try and exploit the fact that memcpy returns its first > > argument as a return value, but that would only be helpful on a target where > > the first argument and return value use the same register. So I'd have a > > slight preference to rewriting per Prathamesh's suggestion above since it's > > more general. > Thanks for the suggestion. The attached patch creates artificial lhs, > and returns it if the function returns it's argument and that argument > is used as return-value. > > eg: > f (void * a1, void * a2, long unsigned int a3) > { > <bb 2> [0.0%]: > # .MEM_5 = VDEF <.MEM_1(D)> > __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D)); > # VUSE <.MEM_5> > return a1_2(D); > > } > > is transformed to: > f (void * a1, void * a2, long unsigned int a3) > { > void * _6; > > <bb 2> [0.0%]: > # .MEM_5 = VDEF <.MEM_1(D)> > _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D)); > # VUSE <.MEM_5> > return _6; > > } > > While testing, I came across an issue with function f() defined > intail-padding1.C: > struct X > { > ~X() {} > int n; > char d; > }; > > X f() > { > X nrvo; > __builtin_memset (&nrvo, 0, sizeof(X)); > return nrvo; > } > > input to the pass: > X f() () > { > <bb 2> [0.0%]: > # .MEM_3 = VDEF <.MEM_1(D)> > __builtin_memset (nrvo_2(D), 0, 8); > # VUSE <.MEM_3> > return nrvo_2(D); > > } > > verify_gimple_return failed with: > tail-padding1.C:13:1: error: invalid conversion in return statement > } > ^ > struct X > > struct X & > > # VUSE <.MEM_3> > return _4; > > It seems the return type of function (struct X) differs with the type > of return value (struct X&). > Not sure how this is possible ?
You need to honor DECL_BY_REFERENCE of DECL_RESULT. > To work around that, I guarded the transform on: > useless_type_conversion_p (TREE_TYPE (TREE_TYPE (cfun->decl)), > TREE_TYPE (retval))) > > in the patch. Does that look OK ? > > Bootstrap+tested on x86_64-unknown-linux-gnu with --enable-languages=all,ada. > Cross-tested on arm*-*-*, aarch64*-*-*. > > Thanks, > Prathamesh > > > > > > Jeff > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)