Re: [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE

Bill Schmidt Fri, 15 Sep 2017 06:17:31 -0700

On Sep 15, 2017, at 4:13 AM, Richard Biener <richard.guent...@gmail.com> wrote:
> 
> On Thu, Sep 14, 2017 at 4:38 PM, Bill Schmidt
> <wschm...@linux.vnet.ibm.com> wrote:
>> On Sep 14, 2017, at 5:15 AM, Richard Biener <richard.guent...@gmail.com> 
>> wrote:
>>> 
>>> On Wed, Sep 13, 2017 at 10:14 PM, Bill Schmidt
>>> <wschm...@linux.vnet.ibm.com> wrote:
>>>> On Sep 13, 2017, at 10:40 AM, Bill Schmidt <wschm...@linux.vnet.ibm.com> 
>>>> wrote:
>>>>> 
>>>>> On Sep 13, 2017, at 7:23 AM, Richard Biener <richard.guent...@gmail.com> 
>>>>> wrote:
>>>>>> 
>>>>>> On Tue, Sep 12, 2017 at 11:08 PM, Will Schmidt
>>>>>> <will_schm...@vnet.ibm.com> wrote:
>>>>>>> Hi,
>>>>>>> 
>>>>>>> [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE
>>>>>>> 
>>>>>>> Folding of vector loads in GIMPLE.
>>>>>>> 
>>>>>>> Add code to handle gimple folding for the vec_ld builtins.
>>>>>>> Remove the now obsoleted folding code for vec_ld from rs6000-c.c. 
>>>>>>> Surrounding
>>>>>>> comments have been adjusted slightly so they continue to read OK for the
>>>>>>> existing vec_st code.
>>>>>>> 
>>>>>>> The resulting code is specifically verified by the 
>>>>>>> powerpc/fold-vec-ld-*.c
>>>>>>> tests which have been posted separately.
>>>>>>> 
>>>>>>> For V2 of this patch, I've removed the chunk of code that prohibited the
>>>>>>> gimple fold from occurring in BE environments.   This had fixed an issue
>>>>>>> for me earlier during my development of the code, and turns out this was
>>>>>>> not necessary.  I've sniff-tested after removing that check and it looks
>>>>>>> OK.
>>>>>>> 
>>>>>>>> + /* Limit folding of loads to LE targets.  */
>>>>>>>> +      if (BYTES_BIG_ENDIAN || VECTOR_ELT_ORDER_BIG)
>>>>>>>> +        return false;
>>>>>>> 
>>>>>>> I've restarted a regression test on this updated version.
>>>>>>> 
>>>>>>> OK for trunk (assuming successful regression test completion)  ?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> -Will
>>>>>>> 
>>>>>>> [gcc]
>>>>>>> 
>>>>>>>     2017-09-12  Will Schmidt  <will_schm...@vnet.ibm.com>
>>>>>>> 
>>>>>>>     * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling
>>>>>>>     for early folding of vector loads (ALTIVEC_BUILTIN_LVX_*).
>>>>>>>     * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
>>>>>>>     Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_LD.
>>>>>>> 
>>>>>>> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
>>>>>>> index fbab0a2..bb8a77d 100644
>>>>>>> --- a/gcc/config/rs6000/rs6000-c.c
>>>>>>> +++ b/gcc/config/rs6000/rs6000-c.c
>>>>>>> @@ -6470,92 +6470,19 @@ altivec_resolve_overloaded_builtin (location_t 
>>>>>>> loc, tree fndecl,
>>>>>>>                  convert (TREE_TYPE (stmt), arg0));
>>>>>>>    stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
>>>>>>>    return stmt;
>>>>>>>  }
>>>>>>> 
>>>>>>> -  /* Expand vec_ld into an expression that masks the address and
>>>>>>> -     performs the load.  We need to expand this early to allow
>>>>>>> +  /* Expand vec_st into an expression that masks the address and
>>>>>>> +     performs the store.  We need to expand this early to allow
>>>>>>>   the best aliasing, as by the time we get into RTL we no longer
>>>>>>>   are able to honor __restrict__, for example.  We may want to
>>>>>>>   consider this for all memory access built-ins.
>>>>>>> 
>>>>>>>   When -maltivec=be is specified, or the wrong number of arguments
>>>>>>>   is provided, simply punt to existing built-in processing.  */
>>>>>>> -  if (fcode == ALTIVEC_BUILTIN_VEC_LD
>>>>>>> -      && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)
>>>>>>> -      && nargs == 2)
>>>>>>> -    {
>>>>>>> -      tree arg0 = (*arglist)[0];
>>>>>>> -      tree arg1 = (*arglist)[1];
>>>>>>> -
>>>>>>> -      /* Strip qualifiers like "const" from the pointer arg.  */
>>>>>>> -      tree arg1_type = TREE_TYPE (arg1);
>>>>>>> -      if (!POINTER_TYPE_P (arg1_type) && TREE_CODE (arg1_type) != 
>>>>>>> ARRAY_TYPE)
>>>>>>> -       goto bad;
>>>>>>> -
>>>>>>> -      tree inner_type = TREE_TYPE (arg1_type);
>>>>>>> -      if (TYPE_QUALS (TREE_TYPE (arg1_type)) != 0)
>>>>>>> -       {
>>>>>>> -         arg1_type = build_pointer_type (build_qualified_type 
>>>>>>> (inner_type,
>>>>>>> -                                                               0));
>>>>>>> -         arg1 = fold_convert (arg1_type, arg1);
>>>>>>> -       }
>>>>>>> -
>>>>>>> -      /* Construct the masked address.  Let existing error handling 
>>>>>>> take
>>>>>>> -        over if we don't have a constant offset.  */
>>>>>>> -      arg0 = fold (arg0);
>>>>>>> -
>>>>>>> -      if (TREE_CODE (arg0) == INTEGER_CST)
>>>>>>> -       {
>>>>>>> -         if (!ptrofftype_p (TREE_TYPE (arg0)))
>>>>>>> -           arg0 = build1 (NOP_EXPR, sizetype, arg0);
>>>>>>> -
>>>>>>> -         tree arg1_type = TREE_TYPE (arg1);
>>>>>>> -         if (TREE_CODE (arg1_type) == ARRAY_TYPE)
>>>>>>> -           {
>>>>>>> -             arg1_type = TYPE_POINTER_TO (TREE_TYPE (arg1_type));
>>>>>>> -             tree const0 = build_int_cstu (sizetype, 0);
>>>>>>> -             tree arg1_elt0 = build_array_ref (loc, arg1, const0);
>>>>>>> -             arg1 = build1 (ADDR_EXPR, arg1_type, arg1_elt0);
>>>>>>> -           }
>>>>>>> -
>>>>>>> -         tree addr = fold_build2_loc (loc, POINTER_PLUS_EXPR, 
>>>>>>> arg1_type,
>>>>>>> -                                      arg1, arg0);
>>>>>>> -         tree aligned = fold_build2_loc (loc, BIT_AND_EXPR, arg1_type, 
>>>>>>> addr,
>>>>>>> -                                         build_int_cst (arg1_type, 
>>>>>>> -16));
>>>>>>> -
>>>>>>> -         /* Find the built-in to get the return type so we can convert
>>>>>>> -            the result properly (or fall back to default handling if 
>>>>>>> the
>>>>>>> -            arguments aren't compatible).  */
>>>>>>> -         for (desc = altivec_overloaded_builtins;
>>>>>>> -              desc->code && desc->code != fcode; desc++)
>>>>>>> -           continue;
>>>>>>> -
>>>>>>> -         for (; desc->code == fcode; desc++)
>>>>>>> -           if (rs6000_builtin_type_compatible (TREE_TYPE (arg0), 
>>>>>>> desc->op1)
>>>>>>> -               && (rs6000_builtin_type_compatible (TREE_TYPE (arg1),
>>>>>>> -                                                   desc->op2)))
>>>>>>> -             {
>>>>>>> -               tree ret_type = rs6000_builtin_type (desc->ret_type);
>>>>>>> -               if (TYPE_MODE (ret_type) == V2DImode)
>>>>>>> -                 /* Type-based aliasing analysis thinks vector long
>>>>>>> -                    and vector long long are different and will put 
>>>>>>> them
>>>>>>> -                    in distinct alias classes.  Force our return type
>>>>>>> -                    to be a may-alias type to avoid this.  */
>>>>>>> -                 ret_type
>>>>>>> -                   = build_pointer_type_for_mode (ret_type, Pmode,
>>>>>>> -                                                  
>>>>>>> true/*can_alias_all*/);
>>>>>>> -               else
>>>>>>> -                 ret_type = build_pointer_type (ret_type);
>>>>>>> -               aligned = build1 (NOP_EXPR, ret_type, aligned);
>>>>>>> -               tree ret_val = build_indirect_ref (loc, aligned, 
>>>>>>> RO_NULL);
>>>>>>> -               return ret_val;
>>>>>>> -             }
>>>>>>> -       }
>>>>>>> -    }
>>>>>>> 
>>>>>>> -  /* Similarly for stvx.  */
>>>>>>> if (fcode == ALTIVEC_BUILTIN_VEC_ST
>>>>>>>    && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)
>>>>>>>    && nargs == 3)
>>>>>>>  {
>>>>>>>    tree arg0 = (*arglist)[0];
>>>>>>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>>>>>>> index 1338371..1fb5f44 100644
>>>>>>> --- a/gcc/config/rs6000/rs6000.c
>>>>>>> +++ b/gcc/config/rs6000/rs6000.c
>>>>>>> @@ -16547,10 +16547,61 @@ rs6000_gimple_fold_builtin 
>>>>>>> (gimple_stmt_iterator *gsi)
>>>>>>>     res = gimple_build (&stmts, VIEW_CONVERT_EXPR, TREE_TYPE (lhs), 
>>>>>>> res);
>>>>>>>     gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
>>>>>>>     update_call_from_tree (gsi, res);
>>>>>>>     return true;
>>>>>>>    }
>>>>>>> +    /* Vector loads.  */
>>>>>>> +    case ALTIVEC_BUILTIN_LVX_V16QI:
>>>>>>> +    case ALTIVEC_BUILTIN_LVX_V8HI:
>>>>>>> +    case ALTIVEC_BUILTIN_LVX_V4SI:
>>>>>>> +    case ALTIVEC_BUILTIN_LVX_V4SF:
>>>>>>> +    case ALTIVEC_BUILTIN_LVX_V2DI:
>>>>>>> +    case ALTIVEC_BUILTIN_LVX_V2DF:
>>>>>>> +      {
>>>>>>> +        gimple *g;
>>>>>>> +        arg0 = gimple_call_arg (stmt, 0);  // offset
>>>>>>> +        arg1 = gimple_call_arg (stmt, 1);  // address
>>>>>>> +
>>>>>>> +        lhs = gimple_call_lhs (stmt);
>>>>>>> +        location_t loc = gimple_location (stmt);
>>>>>>> +
>>>>>>> +        tree arg1_type = TREE_TYPE (arg1);
>>>>>>> +        tree lhs_type = TREE_TYPE (lhs);
>>>>>>> +
>>>>>>> +        /* POINTER_PLUS_EXPR wants the offset to be of type 
>>>>>>> 'sizetype'.  Create
>>>>>>> +           the tree using the value from arg0.  The resulting type 
>>>>>>> will match
>>>>>>> +           the type of arg1.  */
>>>>>>> +        tree temp_offset = create_tmp_reg_or_ssa_name (sizetype);
>>>>>>> +        g = gimple_build_assign (temp_offset, NOP_EXPR, arg0);
>>>>>>> +        gimple_set_location (g, loc);
>>>>>>> +        gsi_insert_before (gsi, g, GSI_SAME_STMT);
>>>>>>> +        tree temp_addr = create_tmp_reg_or_ssa_name (arg1_type);
>>>>>>> +        g = gimple_build_assign (temp_addr, POINTER_PLUS_EXPR, arg1,
>>>>>>> +                                 temp_offset);
>>>>>>> +        gimple_set_location (g, loc);
>>>>>>> +        gsi_insert_before (gsi, g, GSI_SAME_STMT);
>>>>>>> +
>>>>>>> +        /* Mask off any lower bits from the address.  */
>>>>>>> +        tree alignment_mask = build_int_cst (arg1_type, -16);
>>>>>>> +        tree aligned_addr = create_tmp_reg_or_ssa_name (arg1_type);
>>>>>>> +        g = gimple_build_assign (aligned_addr, BIT_AND_EXPR,
>>>>>>> +                                temp_addr, alignment_mask);
>>>>>>> +        gimple_set_location (g, loc);
>>>>>>> +        gsi_insert_before (gsi, g, GSI_SAME_STMT);
>>>>>> 
>>>>>> You could use
>>>>>> 
>>>>>> gimple_seq stmts = NULL;
>>>>>> tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg0);
>>>>>> tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR,
>>>>>> arg1_type, arg1, temp_offset);
>>>>>> tree aligned_addr = gimple_build (&stmts, loc, BIT_AND_EXPR,
>>>>>> arg1_type, temp_addr, build_int_cst (arg1_type, -16));
>>>>>> gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
>>>>>> 
>>>>>>> +        /* Use the build2 helper to set up the mem_ref.  The MEM_REF 
>>>>>>> could also
>>>>>>> +           take an offset, but since we've already incorporated the 
>>>>>>> offset
>>>>>>> +           above, here we just pass in a zero.  */
>>>>>>> +        g = gimple_build_assign (lhs, build2 (MEM_REF, lhs_type, 
>>>>>>> aligned_addr,
>>>>>>> +                                               build_int_cst 
>>>>>>> (arg1_type, 0)));
>>>>>> 
>>>>>> are you sure about arg1_type here?  I'm sure not.  For
>>>>>> 
>>>>>> ... foo (struct S *p)
>>>>>> {
>>>>>> return __builtin_lvx_v2df (4, (double *)p);
>>>>>> }
>>>>>> 
>>>>>> you'd end up with p as arg1 and thus struct S * as arg1_type and thus
>>>>>> TBAA using 'struct S' to access the memory.
>>>>> 
>>>>> Hm, is that so?  Wouldn't arg1_type be double* since arg1 is (double *)p?
>>>>> Will, you should probably test this example and see, but I'm pretty 
>>>>> confident
>>>>> about this (see below).
>>>> 
>>>> But, as I should have suspected, you're right.  For some reason
>>>> gimple_call_arg is returning p, stripped of the cast information where the
>>>> user asserted that p points to a double*.
>>>> 
>>>> Can you explain to me why this should be so?  I assume that somebody
>>>> has decided to strip_nops the argument and lose the cast.
>>> 
>>> pointer types have no meaning in GIMPLE so we aggressively prune them.
>>> 
>>>> Using ptr_type_node loses all type information, so that would be a
>>>> regression from what we do today.  In some cases we could reconstruct
>>>> that this was necessarily, say, a double*, but I don't know how we would
>>>> recover the signedness for an integer type.
>>> 
>>> How did we handle the expansion previously - ah - it was done earlier
>>> in the C FE.  So why are you moving it to GIMPLE?  The function is called
>>> resolve_overloaded_builtin - what kind of overloading do you resolve here?
>>> As said argument types might not be preserved.
>> 
>> The AltiVec builtins allow overloaded names based on the argument types,
>> using a special callout during parsing to convert the overloaded names to
>> type-specific names.  Historically these have then remained builtin calls
>> until RTL expansion, which loses a lot of useful optimization.  Will has been
>> gradually implementing gimple folding for these builtins so that we can
>> optimize simple vector arithmetic and so on.  The overloading is still dealt
>> with during parsing.
>> 
>> As an example:
>> 
>>  double a[64];
>>  vector double x = vec_ld (0, a);
>> 
>> will get translated into
>> 
>>  vector double x = __builtin_altivec_lvx_v2df (0, a);
>> 
>> and
>> 
>>  unsigned char b[64];
>>  vector unsigned char y = vec_ld (0, b);
>> 
>> will get translated into
>> 
>>  vector unsigned char y = __builtin_altivec_lvx_v16qi (0, b);
>> 
>> So in resolving the overloading we still maintain the type info for arg1.
> 
> So TBAA-wise the vec_ld is specced to use alias-set zero for this case
> as it loads from a unsinged char array?  Or is it alias-set zero because
> the type of arg1 is unsigned char *?  What if the type of arg1 was
> struct X *?
> 
>> Earlier I had dealt with the performance issue in a different way for the
>> vec_ld and vec_st overloaded builtins, which created the rather grotty
>> code in rs6000-c.c to modify the parse trees instead.  My hope was that
>> we could simplify the code by having Will deal with them as gimple folds
>> instead.  But if in so doing we lose type information, that may not be the
>> right call.
>> 
>> However, since you say that gimple aggressively removes the casts
>> from pointer types, perhaps the code that we see in early gimple from
>> the existing method might also be missing the type information?  Will,
>> it would be worth looking at that code to see.  If it's no different then
>> perhaps we still go ahead with the folding.
> 
> As I said you can't simply use the type of arg1 for the TBAA type.
> You can conservatively use ptr_type_node (alias-set zero) or you
> can use sth that you derive from the builtin used (is a supposedly
> existing _v4si variant always subject to int * TBAA?)


After thinking about this a while, I believe Will should use ptr_type_node
here.  I think anything we do to try to enforce some TBAA on these
pointer types will be fragile.  Supposedly a _v2df variant should point
only to [an array of] double or to a vector double, and parsing enforces
that at least they've cast to that so they assert they know what they're
doing.  Beyond that we needn't be too fussed if it's actually a
struct X * or the like.  We already have issues with "vector long" and
"vector long long" being different types in theory but aliased together
for 64-bit because they are the same in practice.

As long as we are still commoning identical loads (which didn't used
to happen before the parsing-level expansion was done), I'll be happy.
We can always revisit this later if we feel like refined TBAA would solve a
concrete problem.

Bill
> 
>> Another note for Will:  The existing code gives up when -maltivec=be has
>> been specified, and you probably want to do that as well.  That may be
>> why you initially turned off big endian -- it is easy to misread that code.
>> -maltivec=be is VECTOR_ELT_ORDER_BIG && !BYTES_BIG_ENDIAN.
>> 
>> Thanks,
>> Bill
>>> 
>>> Richard.
>>> 
>>>> Bill
>>>>> 
>>>>>> 
>>>>>> I think if the builtins have any TBAA constraints you need to build those
>>>>>> explicitely, if not, you should use ptr_type_node aka no TBAA.
>>>>> 
>>>>> The type signatures are constrained during parsing, so we should only
>>>>> see allowed pointer types on arg1 by the time we get to gimple folding.  I
>>>>> think that using arg1_type should work, but I am probably missing
>>>>> something subtle, so please feel free to whack me on the temple until
>>>>> I get it. :-)
>>>>> 
>>>>> Bill
>>>>>> 
>>>>>> Richard.
>>>>>> 
>>>>>>> +        gimple_set_location (g, loc);
>>>>>>> +        gsi_replace (gsi, g, true);
>>>>>>> +
>>>>>>> +        return true;
>>>>>>> +
>>>>>>> +      }
>>>>>>> +
>>>>>>>  default:
>>>>>>>     if (TARGET_DEBUG_BUILTIN)
>>>>>>>        fprintf (stderr, "gimple builtin intrinsic not matched:%d %s 
>>>>>>> %s\n",
>>>>>>>                 fn_code, fn_name1, fn_name2);
>>>>>>>    break;

Re: [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE

Reply via email to