On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote: > On 1 December 2016 at 18:26, Richard Biener <rguent...@suse.de> wrote: > > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote: > > > >> On 1 December 2016 at 17:40, Richard Biener <rguent...@suse.de> wrote: > >> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote: > >> > > >> >> On 25 November 2016 at 21:17, Jeff Law <l...@redhat.com> wrote: > >> >> > On 11/25/2016 01:07 AM, Richard Biener wrote: > >> >> > > >> >> >>> For the tail-call, issue should we artificially create a lhs and > >> >> >>> use that > >> >> >>> as return value (perhaps by a separate pass before tailcall) ? > >> >> >>> > >> >> >>> __builtin_memcpy (a1, a2, a3); > >> >> >>> return a1; > >> >> >>> > >> >> >>> gets transformed to: > >> >> >>> _1 = __builtin_memcpy (a1, a2, a3) > >> >> >>> return _1; > >> >> >>> > >> >> >>> So tail-call optimization pass would see the IL in it's expected > >> >> >>> form. > >> >> >> > >> >> >> > >> >> >> As said, a RTL expert needs to chime in here. Iff then tail-call > >> >> >> itself should do this rewrite. But if this form is required to make > >> >> >> things work (I suppose you checked it _does_ actually work?) then > >> >> >> we'd need to make sure later passes do not undo it. So it looks > >> >> >> fragile to me. OTOH I seem to remember that the flags we set on > >> >> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is > >> >> >> verified again there? > >> >> > > >> >> > So tail calling actually sits on the border between trees and RTL. > >> >> > Essentially it's an expand-time decision as we use information from > >> >> > trees as > >> >> > well as low level target information. > >> >> > > >> >> > I would not expect the former sequence to tail call. The tail > >> >> > calling code > >> >> > does not know that the return value from memcpy will be a1. Thus the > >> >> > tail > >> >> > calling code has to assume that it'll have to copy a1 into the return > >> >> > register after returning from memcpy, which obviously can't be done > >> >> > if we > >> >> > tail called memcpy. > >> >> > > >> >> > The second form is much more likely to turn into a tail call sequence > >> >> > because the return value from memcpy will be sitting in the proper > >> >> > register. > >> >> > This form out to work for most calling conventions that allow tail > >> >> > calls. > >> >> > > >> >> > We could (in theory) try and exploit the fact that memcpy returns its > >> >> > first > >> >> > argument as a return value, but that would only be helpful on a > >> >> > target where > >> >> > the first argument and return value use the same register. So I'd > >> >> > have a > >> >> > slight preference to rewriting per Prathamesh's suggestion above > >> >> > since it's > >> >> > more general. > >> >> Thanks for the suggestion. The attached patch creates artificial lhs, > >> >> and returns it if the function returns it's argument and that argument > >> >> is used as return-value. > >> >> > >> >> eg: > >> >> f (void * a1, void * a2, long unsigned int a3) > >> >> { > >> >> <bb 2> [0.0%]: > >> >> # .MEM_5 = VDEF <.MEM_1(D)> > >> >> __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D)); > >> >> # VUSE <.MEM_5> > >> >> return a1_2(D); > >> >> > >> >> } > >> >> > >> >> is transformed to: > >> >> f (void * a1, void * a2, long unsigned int a3) > >> >> { > >> >> void * _6; > >> >> > >> >> <bb 2> [0.0%]: > >> >> # .MEM_5 = VDEF <.MEM_1(D)> > >> >> _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D)); > >> >> # VUSE <.MEM_5> > >> >> return _6; > >> >> > >> >> } > >> >> > >> >> While testing, I came across an issue with function f() defined > >> >> intail-padding1.C: > >> >> struct X > >> >> { > >> >> ~X() {} > >> >> int n; > >> >> char d; > >> >> }; > >> >> > >> >> X f() > >> >> { > >> >> X nrvo; > >> >> __builtin_memset (&nrvo, 0, sizeof(X)); > >> >> return nrvo; > >> >> } > >> >> > >> >> input to the pass: > >> >> X f() () > >> >> { > >> >> <bb 2> [0.0%]: > >> >> # .MEM_3 = VDEF <.MEM_1(D)> > >> >> __builtin_memset (nrvo_2(D), 0, 8); > >> >> # VUSE <.MEM_3> > >> >> return nrvo_2(D); > >> >> > >> >> } > >> >> > >> >> verify_gimple_return failed with: > >> >> tail-padding1.C:13:1: error: invalid conversion in return statement > >> >> } > >> >> ^ > >> >> struct X > >> >> > >> >> struct X & > >> >> > >> >> # VUSE <.MEM_3> > >> >> return _4; > >> >> > >> >> It seems the return type of function (struct X) differs with the type > >> >> of return value (struct X&). > >> >> Not sure how this is possible ? > >> > > >> > You need to honor DECL_BY_REFERENCE of DECL_RESULT. > >> Thanks! Gating on !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl)) > >> resolved the error. > >> Does the attached version look OK ? > > > > + ass_var = make_ssa_name (TREE_TYPE (arg)); > > > > can you try > > > > ass_var = copy_ssa_name (arg); > > > > instead? That way the underlying decl should make sure the > > DECL_BY_REFERENCE check in the IL verification works. > Done in the attached version and verified tail-padding1.C passes with > the change. > Does it look OK ?
+ if (!ass_var) + { + /* Check if function returns one if it's arguments + and that argument is used as return value. + In that case create an artificial lhs to call_stmt, + and set it as the return value. */ + + unsigned rf = gimple_call_return_flags (call); + if (rf & ERF_RETURNS_ARG) + { + unsigned argnum = rf & ERF_RETURN_ARG_MASK; + if (argnum < gimple_call_num_args (call) + && ret_stmt) check && ret_stmt above together with !ass_var. + { + tree arg = gimple_call_arg (call, argnum); + tree retval = gimple_return_retval (ret_stmt); + if (retval + && TREE_CODE (retval) == SSA_NAME + && operand_equal_p (retval, arg, 0) + && !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl))) the DECL_BY_REFERENCE check can be removed now (I think). Richard. > Bootstrap+test in progress on x86_64-unknown-linux-gnu. > > Thanks, > Prathamesh > > > > Thanks, > > Richard. > > > > > >> Validation in progress. > >> > >> Thanks, > >> Prathamesh > >> > > >> >> To work around that, I guarded the transform on: > >> >> useless_type_conversion_p (TREE_TYPE (TREE_TYPE (cfun->decl)), > >> >> TREE_TYPE (retval))) > >> >> > >> >> in the patch. Does that look OK ? > >> >> > >> >> Bootstrap+tested on x86_64-unknown-linux-gnu with > >> >> --enable-languages=all,ada. > >> >> Cross-tested on arm*-*-*, aarch64*-*-*. > >> >> > >> >> Thanks, > >> >> Prathamesh > >> >> > > >> >> > > >> >> > Jeff > >> >> > >> > > >> > -- > >> > Richard Biener <rguent...@suse.de> > >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, > >> > HRB 21284 (AG Nuernberg) > >> > > > > -- > > Richard Biener <rguent...@suse.de> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB > > 21284 (AG Nuernberg) > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)