On Sep 14, 2017, at 5:15 AM, Richard Biener <richard.guent...@gmail.com> wrote:
> 
> On Wed, Sep 13, 2017 at 10:14 PM, Bill Schmidt
> <wschm...@linux.vnet.ibm.com> wrote:
>> On Sep 13, 2017, at 10:40 AM, Bill Schmidt <wschm...@linux.vnet.ibm.com> 
>> wrote:
>>> 
>>> On Sep 13, 2017, at 7:23 AM, Richard Biener <richard.guent...@gmail.com> 
>>> wrote:
>>>> 
>>>> On Tue, Sep 12, 2017 at 11:08 PM, Will Schmidt
>>>> <will_schm...@vnet.ibm.com> wrote:
>>>>> Hi,
>>>>> 
>>>>> [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE
>>>>> 
>>>>> Folding of vector loads in GIMPLE.
>>>>> 
>>>>> Add code to handle gimple folding for the vec_ld builtins.
>>>>> Remove the now obsoleted folding code for vec_ld from rs6000-c.c. 
>>>>> Surrounding
>>>>> comments have been adjusted slightly so they continue to read OK for the
>>>>> existing vec_st code.
>>>>> 
>>>>> The resulting code is specifically verified by the powerpc/fold-vec-ld-*.c
>>>>> tests which have been posted separately.
>>>>> 
>>>>> For V2 of this patch, I've removed the chunk of code that prohibited the
>>>>> gimple fold from occurring in BE environments.   This had fixed an issue
>>>>> for me earlier during my development of the code, and turns out this was
>>>>> not necessary.  I've sniff-tested after removing that check and it looks
>>>>> OK.
>>>>> 
>>>>>> + /* Limit folding of loads to LE targets.  */
>>>>>> +      if (BYTES_BIG_ENDIAN || VECTOR_ELT_ORDER_BIG)
>>>>>> +        return false;
>>>>> 
>>>>> I've restarted a regression test on this updated version.
>>>>> 
>>>>> OK for trunk (assuming successful regression test completion)  ?
>>>>> 
>>>>> Thanks,
>>>>> -Will
>>>>> 
>>>>> [gcc]
>>>>> 
>>>>>      2017-09-12  Will Schmidt  <will_schm...@vnet.ibm.com>
>>>>> 
>>>>>      * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling
>>>>>      for early folding of vector loads (ALTIVEC_BUILTIN_LVX_*).
>>>>>      * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
>>>>>      Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_LD.
>>>>> 
>>>>> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
>>>>> index fbab0a2..bb8a77d 100644
>>>>> --- a/gcc/config/rs6000/rs6000-c.c
>>>>> +++ b/gcc/config/rs6000/rs6000-c.c
>>>>> @@ -6470,92 +6470,19 @@ altivec_resolve_overloaded_builtin (location_t 
>>>>> loc, tree fndecl,
>>>>>                   convert (TREE_TYPE (stmt), arg0));
>>>>>     stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
>>>>>     return stmt;
>>>>>   }
>>>>> 
>>>>> -  /* Expand vec_ld into an expression that masks the address and
>>>>> -     performs the load.  We need to expand this early to allow
>>>>> +  /* Expand vec_st into an expression that masks the address and
>>>>> +     performs the store.  We need to expand this early to allow
>>>>>    the best aliasing, as by the time we get into RTL we no longer
>>>>>    are able to honor __restrict__, for example.  We may want to
>>>>>    consider this for all memory access built-ins.
>>>>> 
>>>>>    When -maltivec=be is specified, or the wrong number of arguments
>>>>>    is provided, simply punt to existing built-in processing.  */
>>>>> -  if (fcode == ALTIVEC_BUILTIN_VEC_LD
>>>>> -      && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)
>>>>> -      && nargs == 2)
>>>>> -    {
>>>>> -      tree arg0 = (*arglist)[0];
>>>>> -      tree arg1 = (*arglist)[1];
>>>>> -
>>>>> -      /* Strip qualifiers like "const" from the pointer arg.  */
>>>>> -      tree arg1_type = TREE_TYPE (arg1);
>>>>> -      if (!POINTER_TYPE_P (arg1_type) && TREE_CODE (arg1_type) != 
>>>>> ARRAY_TYPE)
>>>>> -       goto bad;
>>>>> -
>>>>> -      tree inner_type = TREE_TYPE (arg1_type);
>>>>> -      if (TYPE_QUALS (TREE_TYPE (arg1_type)) != 0)
>>>>> -       {
>>>>> -         arg1_type = build_pointer_type (build_qualified_type 
>>>>> (inner_type,
>>>>> -                                                               0));
>>>>> -         arg1 = fold_convert (arg1_type, arg1);
>>>>> -       }
>>>>> -
>>>>> -      /* Construct the masked address.  Let existing error handling take
>>>>> -        over if we don't have a constant offset.  */
>>>>> -      arg0 = fold (arg0);
>>>>> -
>>>>> -      if (TREE_CODE (arg0) == INTEGER_CST)
>>>>> -       {
>>>>> -         if (!ptrofftype_p (TREE_TYPE (arg0)))
>>>>> -           arg0 = build1 (NOP_EXPR, sizetype, arg0);
>>>>> -
>>>>> -         tree arg1_type = TREE_TYPE (arg1);
>>>>> -         if (TREE_CODE (arg1_type) == ARRAY_TYPE)
>>>>> -           {
>>>>> -             arg1_type = TYPE_POINTER_TO (TREE_TYPE (arg1_type));
>>>>> -             tree const0 = build_int_cstu (sizetype, 0);
>>>>> -             tree arg1_elt0 = build_array_ref (loc, arg1, const0);
>>>>> -             arg1 = build1 (ADDR_EXPR, arg1_type, arg1_elt0);
>>>>> -           }
>>>>> -
>>>>> -         tree addr = fold_build2_loc (loc, POINTER_PLUS_EXPR, arg1_type,
>>>>> -                                      arg1, arg0);
>>>>> -         tree aligned = fold_build2_loc (loc, BIT_AND_EXPR, arg1_type, 
>>>>> addr,
>>>>> -                                         build_int_cst (arg1_type, -16));
>>>>> -
>>>>> -         /* Find the built-in to get the return type so we can convert
>>>>> -            the result properly (or fall back to default handling if the
>>>>> -            arguments aren't compatible).  */
>>>>> -         for (desc = altivec_overloaded_builtins;
>>>>> -              desc->code && desc->code != fcode; desc++)
>>>>> -           continue;
>>>>> -
>>>>> -         for (; desc->code == fcode; desc++)
>>>>> -           if (rs6000_builtin_type_compatible (TREE_TYPE (arg0), 
>>>>> desc->op1)
>>>>> -               && (rs6000_builtin_type_compatible (TREE_TYPE (arg1),
>>>>> -                                                   desc->op2)))
>>>>> -             {
>>>>> -               tree ret_type = rs6000_builtin_type (desc->ret_type);
>>>>> -               if (TYPE_MODE (ret_type) == V2DImode)
>>>>> -                 /* Type-based aliasing analysis thinks vector long
>>>>> -                    and vector long long are different and will put them
>>>>> -                    in distinct alias classes.  Force our return type
>>>>> -                    to be a may-alias type to avoid this.  */
>>>>> -                 ret_type
>>>>> -                   = build_pointer_type_for_mode (ret_type, Pmode,
>>>>> -                                                  true/*can_alias_all*/);
>>>>> -               else
>>>>> -                 ret_type = build_pointer_type (ret_type);
>>>>> -               aligned = build1 (NOP_EXPR, ret_type, aligned);
>>>>> -               tree ret_val = build_indirect_ref (loc, aligned, RO_NULL);
>>>>> -               return ret_val;
>>>>> -             }
>>>>> -       }
>>>>> -    }
>>>>> 
>>>>> -  /* Similarly for stvx.  */
>>>>> if (fcode == ALTIVEC_BUILTIN_VEC_ST
>>>>>     && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)
>>>>>     && nargs == 3)
>>>>>   {
>>>>>     tree arg0 = (*arglist)[0];
>>>>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>>>>> index 1338371..1fb5f44 100644
>>>>> --- a/gcc/config/rs6000/rs6000.c
>>>>> +++ b/gcc/config/rs6000/rs6000.c
>>>>> @@ -16547,10 +16547,61 @@ rs6000_gimple_fold_builtin 
>>>>> (gimple_stmt_iterator *gsi)
>>>>>      res = gimple_build (&stmts, VIEW_CONVERT_EXPR, TREE_TYPE (lhs), res);
>>>>>      gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
>>>>>      update_call_from_tree (gsi, res);
>>>>>      return true;
>>>>>     }
>>>>> +    /* Vector loads.  */
>>>>> +    case ALTIVEC_BUILTIN_LVX_V16QI:
>>>>> +    case ALTIVEC_BUILTIN_LVX_V8HI:
>>>>> +    case ALTIVEC_BUILTIN_LVX_V4SI:
>>>>> +    case ALTIVEC_BUILTIN_LVX_V4SF:
>>>>> +    case ALTIVEC_BUILTIN_LVX_V2DI:
>>>>> +    case ALTIVEC_BUILTIN_LVX_V2DF:
>>>>> +      {
>>>>> +        gimple *g;
>>>>> +        arg0 = gimple_call_arg (stmt, 0);  // offset
>>>>> +        arg1 = gimple_call_arg (stmt, 1);  // address
>>>>> +
>>>>> +        lhs = gimple_call_lhs (stmt);
>>>>> +        location_t loc = gimple_location (stmt);
>>>>> +
>>>>> +        tree arg1_type = TREE_TYPE (arg1);
>>>>> +        tree lhs_type = TREE_TYPE (lhs);
>>>>> +
>>>>> +        /* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  
>>>>> Create
>>>>> +           the tree using the value from arg0.  The resulting type will 
>>>>> match
>>>>> +           the type of arg1.  */
>>>>> +        tree temp_offset = create_tmp_reg_or_ssa_name (sizetype);
>>>>> +        g = gimple_build_assign (temp_offset, NOP_EXPR, arg0);
>>>>> +        gimple_set_location (g, loc);
>>>>> +        gsi_insert_before (gsi, g, GSI_SAME_STMT);
>>>>> +        tree temp_addr = create_tmp_reg_or_ssa_name (arg1_type);
>>>>> +        g = gimple_build_assign (temp_addr, POINTER_PLUS_EXPR, arg1,
>>>>> +                                 temp_offset);
>>>>> +        gimple_set_location (g, loc);
>>>>> +        gsi_insert_before (gsi, g, GSI_SAME_STMT);
>>>>> +
>>>>> +        /* Mask off any lower bits from the address.  */
>>>>> +        tree alignment_mask = build_int_cst (arg1_type, -16);
>>>>> +        tree aligned_addr = create_tmp_reg_or_ssa_name (arg1_type);
>>>>> +        g = gimple_build_assign (aligned_addr, BIT_AND_EXPR,
>>>>> +                                temp_addr, alignment_mask);
>>>>> +        gimple_set_location (g, loc);
>>>>> +        gsi_insert_before (gsi, g, GSI_SAME_STMT);
>>>> 
>>>> You could use
>>>> 
>>>> gimple_seq stmts = NULL;
>>>> tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg0);
>>>> tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR,
>>>> arg1_type, arg1, temp_offset);
>>>> tree aligned_addr = gimple_build (&stmts, loc, BIT_AND_EXPR,
>>>> arg1_type, temp_addr, build_int_cst (arg1_type, -16));
>>>> gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
>>>> 
>>>>> +        /* Use the build2 helper to set up the mem_ref.  The MEM_REF 
>>>>> could also
>>>>> +           take an offset, but since we've already incorporated the 
>>>>> offset
>>>>> +           above, here we just pass in a zero.  */
>>>>> +        g = gimple_build_assign (lhs, build2 (MEM_REF, lhs_type, 
>>>>> aligned_addr,
>>>>> +                                               build_int_cst (arg1_type, 
>>>>> 0)));
>>>> 
>>>> are you sure about arg1_type here?  I'm sure not.  For
>>>> 
>>>> ... foo (struct S *p)
>>>> {
>>>> return __builtin_lvx_v2df (4, (double *)p);
>>>> }
>>>> 
>>>> you'd end up with p as arg1 and thus struct S * as arg1_type and thus
>>>> TBAA using 'struct S' to access the memory.
>>> 
>>> Hm, is that so?  Wouldn't arg1_type be double* since arg1 is (double *)p?
>>> Will, you should probably test this example and see, but I'm pretty 
>>> confident
>>> about this (see below).
>> 
>> But, as I should have suspected, you're right.  For some reason
>> gimple_call_arg is returning p, stripped of the cast information where the
>> user asserted that p points to a double*.
>> 
>> Can you explain to me why this should be so?  I assume that somebody
>> has decided to strip_nops the argument and lose the cast.
> 
> pointer types have no meaning in GIMPLE so we aggressively prune them.
> 
>> Using ptr_type_node loses all type information, so that would be a
>> regression from what we do today.  In some cases we could reconstruct
>> that this was necessarily, say, a double*, but I don't know how we would
>> recover the signedness for an integer type.
> 
> How did we handle the expansion previously - ah - it was done earlier
> in the C FE.  So why are you moving it to GIMPLE?  The function is called
> resolve_overloaded_builtin - what kind of overloading do you resolve here?
> As said argument types might not be preserved.

The AltiVec builtins allow overloaded names based on the argument types,
using a special callout during parsing to convert the overloaded names to
type-specific names.  Historically these have then remained builtin calls
until RTL expansion, which loses a lot of useful optimization.  Will has been
gradually implementing gimple folding for these builtins so that we can
optimize simple vector arithmetic and so on.  The overloading is still dealt
with during parsing.

As an example:

  double a[64];
  vector double x = vec_ld (0, a);

will get translated into

  vector double x = __builtin_altivec_lvx_v2df (0, a);

and 

  unsigned char b[64];
  vector unsigned char y = vec_ld (0, b);

will get translated into

  vector unsigned char y = __builtin_altivec_lvx_v16qi (0, b);

So in resolving the overloading we still maintain the type info for arg1.

Earlier I had dealt with the performance issue in a different way for the 
vec_ld and vec_st overloaded builtins, which created the rather grotty 
code in rs6000-c.c to modify the parse trees instead.  My hope was that
we could simplify the code by having Will deal with them as gimple folds
instead.  But if in so doing we lose type information, that may not be the
right call.

However, since you say that gimple aggressively removes the casts 
from pointer types, perhaps the code that we see in early gimple from
the existing method might also be missing the type information?  Will,
it would be worth looking at that code to see.  If it's no different then
perhaps we still go ahead with the folding.

Another note for Will:  The existing code gives up when -maltivec=be has
been specified, and you probably want to do that as well.  That may be
why you initially turned off big endian -- it is easy to misread that code.
-maltivec=be is VECTOR_ELT_ORDER_BIG && !BYTES_BIG_ENDIAN.

Thanks,
Bill
> 
> Richard.
> 
>> Bill
>>> 
>>>> 
>>>> I think if the builtins have any TBAA constraints you need to build those
>>>> explicitely, if not, you should use ptr_type_node aka no TBAA.
>>> 
>>> The type signatures are constrained during parsing, so we should only
>>> see allowed pointer types on arg1 by the time we get to gimple folding.  I
>>> think that using arg1_type should work, but I am probably missing
>>> something subtle, so please feel free to whack me on the temple until
>>> I get it. :-)
>>> 
>>> Bill
>>>> 
>>>> Richard.
>>>> 
>>>>> +        gimple_set_location (g, loc);
>>>>> +        gsi_replace (gsi, g, true);
>>>>> +
>>>>> +        return true;
>>>>> +
>>>>> +      }
>>>>> +
>>>>>   default:
>>>>>      if (TARGET_DEBUG_BUILTIN)
>>>>>         fprintf (stderr, "gimple builtin intrinsic not matched:%d %s 
>>>>> %s\n",
>>>>>                  fn_code, fn_name1, fn_name2);
>>>>>     break;

Reply via email to