On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:

> On 1 December 2016 at 18:26, Richard Biener <rguent...@suse.de> wrote:
> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
> >
> >> On 1 December 2016 at 17:40, Richard Biener <rguent...@suse.de> wrote:
> >> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
> >> >
> >> >> On 25 November 2016 at 21:17, Jeff Law <l...@redhat.com> wrote:
> >> >> > On 11/25/2016 01:07 AM, Richard Biener wrote:
> >> >> >
> >> >> >>> For the tail-call, issue should we artificially create a lhs and 
> >> >> >>> use that
> >> >> >>> as return value (perhaps by a separate pass before tailcall) ?
> >> >> >>>
> >> >> >>> __builtin_memcpy (a1, a2, a3);
> >> >> >>> return a1;
> >> >> >>>
> >> >> >>> gets transformed to:
> >> >> >>> _1 = __builtin_memcpy (a1, a2, a3)
> >> >> >>> return _1;
> >> >> >>>
> >> >> >>> So tail-call optimization pass would see the IL in it's expected 
> >> >> >>> form.
> >> >> >>
> >> >> >>
> >> >> >> As said, a RTL expert needs to chime in here.  Iff then tail-call
> >> >> >> itself should do this rewrite.  But if this form is required to make
> >> >> >> things work (I suppose you checked it _does_ actually work?) then
> >> >> >> we'd need to make sure later passes do not undo it.  So it looks
> >> >> >> fragile to me.  OTOH I seem to remember that the flags we set on
> >> >> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is
> >> >> >> verified again there?
> >> >> >
> >> >> > So tail calling actually sits on the border between trees and RTL.
> >> >> > Essentially it's an expand-time decision as we use information from 
> >> >> > trees as
> >> >> > well as low level target information.
> >> >> >
> >> >> > I would not expect the former sequence to tail call.  The tail 
> >> >> > calling code
> >> >> > does not know that the return value from memcpy will be a1.  Thus the 
> >> >> > tail
> >> >> > calling code has to assume that it'll have to copy a1 into the return
> >> >> > register after returning from memcpy, which obviously can't be done 
> >> >> > if we
> >> >> > tail called memcpy.
> >> >> >
> >> >> > The second form is much more likely to turn into a tail call sequence
> >> >> > because the return value from memcpy will be sitting in the proper 
> >> >> > register.
> >> >> > This form out to work for most calling conventions that allow tail 
> >> >> > calls.
> >> >> >
> >> >> > We could (in theory) try and exploit the fact that memcpy returns its 
> >> >> > first
> >> >> > argument as a return value, but that would only be helpful on a 
> >> >> > target where
> >> >> > the first argument and return value use the same register. So I'd 
> >> >> > have a
> >> >> > slight preference to rewriting per Prathamesh's suggestion above 
> >> >> > since it's
> >> >> > more general.
> >> >> Thanks for the suggestion. The attached patch creates artificial lhs,
> >> >> and returns it if the function returns it's argument and that argument
> >> >> is used as return-value.
> >> >>
> >> >> eg:
> >> >> f (void * a1, void * a2, long unsigned int a3)
> >> >> {
> >> >>   <bb 2> [0.0%]:
> >> >>   # .MEM_5 = VDEF <.MEM_1(D)>
> >> >>   __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
> >> >>   # VUSE <.MEM_5>
> >> >>   return a1_2(D);
> >> >>
> >> >> }
> >> >>
> >> >> is transformed to:
> >> >> f (void * a1, void * a2, long unsigned int a3)
> >> >> {
> >> >>   void * _6;
> >> >>
> >> >>   <bb 2> [0.0%]:
> >> >>   # .MEM_5 = VDEF <.MEM_1(D)>
> >> >>   _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
> >> >>   # VUSE <.MEM_5>
> >> >>   return _6;
> >> >>
> >> >> }
> >> >>
> >> >> While testing, I came across an issue with function f() defined
> >> >> intail-padding1.C:
> >> >> struct X
> >> >> {
> >> >>   ~X() {}
> >> >>   int n;
> >> >>   char d;
> >> >> };
> >> >>
> >> >> X f()
> >> >> {
> >> >>   X nrvo;
> >> >>   __builtin_memset (&nrvo, 0, sizeof(X));
> >> >>   return nrvo;
> >> >> }
> >> >>
> >> >> input to the pass:
> >> >> X f() ()
> >> >> {
> >> >>   <bb 2> [0.0%]:
> >> >>   # .MEM_3 = VDEF <.MEM_1(D)>
> >> >>   __builtin_memset (nrvo_2(D), 0, 8);
> >> >>   # VUSE <.MEM_3>
> >> >>   return nrvo_2(D);
> >> >>
> >> >> }
> >> >>
> >> >> verify_gimple_return failed with:
> >> >> tail-padding1.C:13:1: error: invalid conversion in return statement
> >> >>  }
> >> >>  ^
> >> >> struct X
> >> >>
> >> >> struct X &
> >> >>
> >> >> # VUSE <.MEM_3>
> >> >> return _4;
> >> >>
> >> >> It seems the return type of function (struct X) differs with the type
> >> >> of return value (struct X&).
> >> >> Not sure how this is possible ?
> >> >
> >> > You need to honor DECL_BY_REFERENCE of DECL_RESULT.
> >> Thanks! Gating on !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl))
> >> resolved the error.
> >> Does the attached version look OK ?
> >
> > +                         ass_var = make_ssa_name (TREE_TYPE (arg));
> >
> > can you try
> >
> >       ass_var = copy_ssa_name (arg);
> >
> > instead?  That way the underlying decl should make sure the
> > DECL_BY_REFERENCE check in the IL verification works.
> Done in the attached version and verified tail-padding1.C passes with
> the change.
> Does it look OK ?

+         if (!ass_var)
+           {
+             /* Check if function returns one if it's arguments
+                and that argument is used as return value.
+                In that case create an artificial lhs to call_stmt,
+                and set it as the return value.  */
+
+             unsigned rf = gimple_call_return_flags (call);
+             if (rf & ERF_RETURNS_ARG)
+               {
+                 unsigned argnum = rf & ERF_RETURN_ARG_MASK;
+                 if (argnum < gimple_call_num_args (call)
+                     && ret_stmt)

check && ret_stmt above together with !ass_var.

+                   {
+                     tree arg = gimple_call_arg (call, argnum);
+                     tree retval = gimple_return_retval (ret_stmt);
+                     if (retval
+                         && TREE_CODE (retval) == SSA_NAME
+                         && operand_equal_p (retval, arg, 0)
+                         && !DECL_BY_REFERENCE (DECL_RESULT 
(cfun->decl)))

the DECL_BY_REFERENCE check can be removed now (I think).

Richard.

> Bootstrap+test in progress on x86_64-unknown-linux-gnu.
> 
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Richard.
> >
> >
> >> Validation in progress.
> >>
> >> Thanks,
> >> Prathamesh
> >> >
> >> >> To work around that, I guarded the transform on:
> >> >> useless_type_conversion_p (TREE_TYPE (TREE_TYPE (cfun->decl)),
> >> >>                                              TREE_TYPE (retval)))
> >> >>
> >> >> in the patch. Does that look OK ?
> >> >>
> >> >> Bootstrap+tested on x86_64-unknown-linux-gnu with 
> >> >> --enable-languages=all,ada.
> >> >> Cross-tested on arm*-*-*, aarch64*-*-*.
> >> >>
> >> >> Thanks,
> >> >> Prathamesh
> >> >> >
> >> >> >
> >> >> > Jeff
> >> >>
> >> >
> >> > --
> >> > Richard Biener <rguent...@suse.de>
> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, 
> >> > HRB 21284 (AG Nuernberg)
> >>
> >
> > --
> > Richard Biener <rguent...@suse.de>
> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> > 21284 (AG Nuernberg)
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Reply via email to