On 1 December 2016 at 18:38, Richard Biener <rguent...@suse.de> wrote: > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote: > >> On 1 December 2016 at 18:26, Richard Biener <rguent...@suse.de> wrote: >> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote: >> > >> >> On 1 December 2016 at 17:40, Richard Biener <rguent...@suse.de> wrote: >> >> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote: >> >> > >> >> >> On 25 November 2016 at 21:17, Jeff Law <l...@redhat.com> wrote: >> >> >> > On 11/25/2016 01:07 AM, Richard Biener wrote: >> >> >> > >> >> >> >>> For the tail-call, issue should we artificially create a lhs and >> >> >> >>> use that >> >> >> >>> as return value (perhaps by a separate pass before tailcall) ? >> >> >> >>> >> >> >> >>> __builtin_memcpy (a1, a2, a3); >> >> >> >>> return a1; >> >> >> >>> >> >> >> >>> gets transformed to: >> >> >> >>> _1 = __builtin_memcpy (a1, a2, a3) >> >> >> >>> return _1; >> >> >> >>> >> >> >> >>> So tail-call optimization pass would see the IL in it's expected >> >> >> >>> form. >> >> >> >> >> >> >> >> >> >> >> >> As said, a RTL expert needs to chime in here. Iff then tail-call >> >> >> >> itself should do this rewrite. But if this form is required to make >> >> >> >> things work (I suppose you checked it _does_ actually work?) then >> >> >> >> we'd need to make sure later passes do not undo it. So it looks >> >> >> >> fragile to me. OTOH I seem to remember that the flags we set on >> >> >> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is >> >> >> >> verified again there? >> >> >> > >> >> >> > So tail calling actually sits on the border between trees and RTL. >> >> >> > Essentially it's an expand-time decision as we use information from >> >> >> > trees as >> >> >> > well as low level target information. >> >> >> > >> >> >> > I would not expect the former sequence to tail call. The tail >> >> >> > calling code >> >> >> > does not know that the return value from memcpy will be a1. Thus >> >> >> > the tail >> >> >> > calling code has to assume that it'll have to copy a1 into the return >> >> >> > register after returning from memcpy, which obviously can't be done >> >> >> > if we >> >> >> > tail called memcpy. >> >> >> > >> >> >> > The second form is much more likely to turn into a tail call sequence >> >> >> > because the return value from memcpy will be sitting in the proper >> >> >> > register. >> >> >> > This form out to work for most calling conventions that allow tail >> >> >> > calls. >> >> >> > >> >> >> > We could (in theory) try and exploit the fact that memcpy returns >> >> >> > its first >> >> >> > argument as a return value, but that would only be helpful on a >> >> >> > target where >> >> >> > the first argument and return value use the same register. So I'd >> >> >> > have a >> >> >> > slight preference to rewriting per Prathamesh's suggestion above >> >> >> > since it's >> >> >> > more general. >> >> >> Thanks for the suggestion. The attached patch creates artificial lhs, >> >> >> and returns it if the function returns it's argument and that argument >> >> >> is used as return-value. >> >> >> >> >> >> eg: >> >> >> f (void * a1, void * a2, long unsigned int a3) >> >> >> { >> >> >> <bb 2> [0.0%]: >> >> >> # .MEM_5 = VDEF <.MEM_1(D)> >> >> >> __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D)); >> >> >> # VUSE <.MEM_5> >> >> >> return a1_2(D); >> >> >> >> >> >> } >> >> >> >> >> >> is transformed to: >> >> >> f (void * a1, void * a2, long unsigned int a3) >> >> >> { >> >> >> void * _6; >> >> >> >> >> >> <bb 2> [0.0%]: >> >> >> # .MEM_5 = VDEF <.MEM_1(D)> >> >> >> _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D)); >> >> >> # VUSE <.MEM_5> >> >> >> return _6; >> >> >> >> >> >> } >> >> >> >> >> >> While testing, I came across an issue with function f() defined >> >> >> intail-padding1.C: >> >> >> struct X >> >> >> { >> >> >> ~X() {} >> >> >> int n; >> >> >> char d; >> >> >> }; >> >> >> >> >> >> X f() >> >> >> { >> >> >> X nrvo; >> >> >> __builtin_memset (&nrvo, 0, sizeof(X)); >> >> >> return nrvo; >> >> >> } >> >> >> >> >> >> input to the pass: >> >> >> X f() () >> >> >> { >> >> >> <bb 2> [0.0%]: >> >> >> # .MEM_3 = VDEF <.MEM_1(D)> >> >> >> __builtin_memset (nrvo_2(D), 0, 8); >> >> >> # VUSE <.MEM_3> >> >> >> return nrvo_2(D); >> >> >> >> >> >> } >> >> >> >> >> >> verify_gimple_return failed with: >> >> >> tail-padding1.C:13:1: error: invalid conversion in return statement >> >> >> } >> >> >> ^ >> >> >> struct X >> >> >> >> >> >> struct X & >> >> >> >> >> >> # VUSE <.MEM_3> >> >> >> return _4; >> >> >> >> >> >> It seems the return type of function (struct X) differs with the type >> >> >> of return value (struct X&). >> >> >> Not sure how this is possible ? >> >> > >> >> > You need to honor DECL_BY_REFERENCE of DECL_RESULT. >> >> Thanks! Gating on !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl)) >> >> resolved the error. >> >> Does the attached version look OK ? >> > >> > + ass_var = make_ssa_name (TREE_TYPE (arg)); >> > >> > can you try >> > >> > ass_var = copy_ssa_name (arg); >> > >> > instead? That way the underlying decl should make sure the >> > DECL_BY_REFERENCE check in the IL verification works. >> Done in the attached version and verified tail-padding1.C passes with >> the change. >> Does it look OK ? > > + if (!ass_var) > + { > + /* Check if function returns one if it's arguments > + and that argument is used as return value. > + In that case create an artificial lhs to call_stmt, > + and set it as the return value. */ > + > + unsigned rf = gimple_call_return_flags (call); > + if (rf & ERF_RETURNS_ARG) > + { > + unsigned argnum = rf & ERF_RETURN_ARG_MASK; > + if (argnum < gimple_call_num_args (call) > + && ret_stmt) > > check && ret_stmt above together with !ass_var. Oops, sorry about that. > > + { > + tree arg = gimple_call_arg (call, argnum); > + tree retval = gimple_return_retval (ret_stmt); > + if (retval > + && TREE_CODE (retval) == SSA_NAME > + && operand_equal_p (retval, arg, 0) > + && !DECL_BY_REFERENCE (DECL_RESULT > (cfun->decl))) > > the DECL_BY_REFERENCE check can be removed now (I think). Well after removing DECL_BY_REFERENCE, verify_gimple still fails but differently:
tail-padding1.C:13:1: error: RESULT_DECL should be read only when DECL_BY_REFERENCE is set } ^ while verifying SSA_NAME nrvo_4 in statement # .MEM_3 = VDEF <.MEM_1(D)> nrvo_4 = __builtin_memset (nrvo_2(D), 0, 8); tail-padding1.C:13:1: internal compiler error: verify_ssa failed Thanks, Prathamesh > > Richard. > >> Bootstrap+test in progress on x86_64-unknown-linux-gnu. >> >> Thanks, >> Prathamesh >> > >> > Thanks, >> > Richard. >> > >> > >> >> Validation in progress. >> >> >> >> Thanks, >> >> Prathamesh >> >> > >> >> >> To work around that, I guarded the transform on: >> >> >> useless_type_conversion_p (TREE_TYPE (TREE_TYPE (cfun->decl)), >> >> >> TREE_TYPE (retval))) >> >> >> >> >> >> in the patch. Does that look OK ? >> >> >> >> >> >> Bootstrap+tested on x86_64-unknown-linux-gnu with >> >> >> --enable-languages=all,ada. >> >> >> Cross-tested on arm*-*-*, aarch64*-*-*. >> >> >> >> >> >> Thanks, >> >> >> Prathamesh >> >> >> > >> >> >> > >> >> >> > Jeff >> >> >> >> >> > >> >> > -- >> >> > Richard Biener <rguent...@suse.de> >> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, >> >> > HRB 21284 (AG Nuernberg) >> >> >> > >> > -- >> > Richard Biener <rguent...@suse.de> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB >> > 21284 (AG Nuernberg) >> > > -- > Richard Biener <rguent...@suse.de> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB > 21284 (AG Nuernberg)