On Thu, May 4, 2017 at 7:21 PM, Richard Sandiford
<richard.sandif...@linaro.org> wrote:
> Richard Biener <richard.guent...@gmail.com> writes:
>> On Thu, May 4, 2017 at 2:12 PM, Richard Biener
>> <richard.guent...@gmail.com> wrote:
>>> On Wed, May 3, 2017 at 10:00 AM, Richard Sandiford
>>> <richard.sandif...@linaro.org> wrote:
>>>> This patch tries to calculate conservatively-correct distance
>>>> vectors for two references whose base addresses are not the same.
>>>> It sets a new flag DDR_COULD_BE_INDEPENDENT_P if the dependence
>>>> isn't guaranteed to occur.
>>>>
>>>> The motivating example is:
>>>>
>>>>   struct s { int x[8]; };
>>>>   void
>>>>   f (struct s *a, struct s *b)
>>>>   {
>>>>     for (int i = 0; i < 8; ++i)
>>>>       a->x[i] += b->x[i];
>>>>   }
>>>>
>>>> in which the "a" and "b" accesses are either independent or have a
>>>> dependence distance of 0 (assuming -fstrict-aliasing).  Neither case
>>>> prevents vectorisation, so we can vectorise without an alias check.
>>>>
>>>> I'd originally wanted to do the same thing for arrays as well, e.g.:
>>>>
>>>>   void
>>>>   f (int a[][8], struct b[][8])
>>>>   {
>>>>     for (int i = 0; i < 8; ++i)
>>>>       a[0][i] += b[0][i];
>>>>   }
>>>>
>>>> I think this is valid because C11 6.7.6.2/6 says:
>>>>
>>>>   For two array types to be compatible, both shall have compatible
>>>>   element types, and if both size specifiers are present, and are
>>>>   integer constant expressions, then both size specifiers shall have
>>>>   the same constant value.
>>>>
>>>> So if we access an array through an int (*)[8], it must have type X[8]
>>>> or X[], where X is compatible with int.  It doesn't seem possible in
>>>> either case for "a[0]" and "b[0]" to overlap when "a != b".
>>>>
>>>> However, Richard B said that (at least in gimple) we support arbitrary
>>>> overlap of arrays and allow arrays to be accessed with different
>>>> dimensionality.  There are examples of this in PR50067.  I've therefore
>>>> only handled references that end in a structure field access.
>>>>
>>>> There are two ways of handling these dependences in the vectoriser:
>>>> use them to limit VF, or check at runtime as before.  I've gone for
>>>> the approach of checking at runtime if we can, to avoid limiting VF
>>>> unnecessarily.  We still fall back to a VF cap when runtime checks
>>>> aren't allowed.
>>>>
>>>> The patch tests whether we queued an alias check with a dependence
>>>> distance of X and then picked a VF <= X, in which case it's safe to
>>>> drop the alias check.  Since vect_prune_runtime_alias_check_list can
>>>> be called twice with different VF for the same loop, it's no longer
>>>> safe to clear may_alias_ddrs on exit.  Instead we should use
>>>> comp_alias_ddrs to check whether versioning is necessary.
>>>>
>>>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>>>
>>> You seem to do your "fancy" thing but also later compute the old
>>> base equality anyway (for same_base_p).  It looks to me for this
>>> case the new fancy code can be simply skipped, keeping num_dimensions
>>> as before?
>>>
>>> +      /* Try to approach equal type sizes.  */
>>> +      if (!COMPLETE_TYPE_P (type_a)
>>> +         || !COMPLETE_TYPE_P (type_b)
>>> +         || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a))
>>> +         || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b)))
>>> +       break;
>>>
>>> ah, interesting idea to avoid a quadratic search.  Note that you should
>>> conservatively handle both BIT_FIELD_REF and VIEW_CONVERT_EXPR
>>> as they are used for type-punning.
>
> All the component refs here should be REALPART_EXPRs, IMAGPART_EXPRs,
> ARRAY_REFs or COMPONENT_REFs of structures, since that's all that
> dr_analyze_indices allows, so I think we safe in terms of the tree codes.

Yeah.  I think we need to document that we should have a 1:1 match here.

>>> I see nonoverlapping_component_refs_of_decl_p should simply skip
>>> ARRAY_REFs - but I also see there:
>>>
>>>       /* ??? We cannot simply use the type of operand #0 of the refs here
>>>          as the Fortran compiler smuggles type punning into COMPONENT_REFs
>>>          for common blocks instead of using unions like everyone else.  */
>>>       tree type1 = DECL_CONTEXT (field1);
>>>       tree type2 = DECL_CONTEXT (field2);
>>>
>>> so you probably can't simply use TREE_TYPE (outer_ref) for type 
>>> compatibility.
>>> You also may not use types_compatible_p here as for LTO that is _way_ too
>>> lax for aggregates.  The above uses
>>>
>>>       /* We cannot disambiguate fields in a union or qualified union.  */
>>>       if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE)
>>>          return false;
>>>
>>> so you should also bail out on unions here, rather than the check you do 
>>> later.
>
> The loop stops before we get to a union, so I think "only" the RECORD_TYPE
> COMPONENT_REF handling is a potential problem.  Does this mean that
> I should use the nonoverlapping_component_refs_of_decl_p code:
>
>       tree field1 = TREE_OPERAND (ref1, 1);
>       tree field2 = TREE_OPERAND (ref2, 1);
>
>       /* ??? We cannot simply use the type of operand #0 of the refs here
>          as the Fortran compiler smuggles type punning into COMPONENT_REFs
>          for common blocks instead of using unions like everyone else.  */
>       tree type1 = DECL_CONTEXT (field1);
>       tree type2 = DECL_CONTEXT (field2);
>
>       /* We cannot disambiguate fields in a union or qualified union.  */
>       if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE)
>          return false;
>
>       if (field1 != field2)
>         {
>           /* A field and its representative need to be considered the
>              same.  */
>           if (DECL_BIT_FIELD_REPRESENTATIVE (field1) == field2
>               || DECL_BIT_FIELD_REPRESENTATIVE (field2) == field1)
>             return false;
>           /* Different fields of the same record type cannot overlap.
>              ??? Bitfields can overlap at RTL level so punt on them.  */
>           if (DECL_BIT_FIELD (field1) && DECL_BIT_FIELD (field2))
>             return false;
>           return true;
>         }
>
> as the disambiguation test for COMPONENT_REFs, instead of types_compatible_p
> during the new loop?

Yes.  OTOH you want to "match" while the above disambiguates.  So it means
you should use either FIELD_DECL equality or DECL_CONTEXT of the FIELD_DECL
equality (which should be the same in the end).  The RTL concern
should not matter
here.

>  And test for this as well as unions in the outer
> references?

So looking at dr_analyze_indices a union would be always the DR_BASE_OBJECT,
and you (should) stop the ref walk at DR_BASE_OBJECT.  The dr_analyze_indices
code is also somewhat fishy in that it simply ignores everything below unhandled
component-refs even if there are indices involved (and it gets away
with this because
dependence analysis likely/hopefully gives up on the DR_BASE_OBJECT equality
test in case it is sth like a[i].union for example ... hopefully ...).

>>> You seem to rely on getting an access_fn entry for each handled_component_p.
>>> It looks like this is the case -- we even seem to stop at unions (with the 
>>> same
>>> fortran "issue").  I'm not sure that's the best thing to do but you
>>> rely on that.
>
> Yeah, the loop is deliberately limited to the components associated with
> an access_fn.  I did wonder at first whether dr_analyze_indices should
> store the original component reference trees for each access function.
> That would make things simpler and more explicit, but would also eat up
> more memory.  Things like object_address_invariant_in_loop_p rely on the
> access_fns in the same way that the loop in the patch does.

in fact it fails to handle ARRAY_RANGE_REFs ...

>>> I don't understand the looping, it needs more comments.  You seem to be
>>> looking for the innermost compatible RECORD_TYPE but then num_dimensions
>>> is how many compatible refs you found on the way (with incompatible ones
>>> not counting?!).  What about an inner varying array of structs?  This seems 
>>> to
>>> be disregarded in the analysis now?  Thus, a[i].s.b[i].j vs. __real
>>> b[i].s.b[i].j?
>
> I'll try to improve the comments.  But the idea is that both sequences are
> as long as possible, while that still gives compatible types.  If there is
> more than one such sequence, we pick the one nearest the base.
>
> So in your example, the access functions would be:
>
>                0   1   2   3   4
>   a:          .j [i]  .b  .s [i]
>
>            0   1   2   3   4   5
>   b:  __real  .j [i]  .b  .s [i]
>
> If a and b are pointers, the final access functions would be
> unconstrained base accesses, so we'd end up with:
>
>   a: [0, 3]
>   b: [1, 4]
>
> for both sequences.
>
>>> nonoverlapping_component_refs_of_decl_p/nonoverlapping_component_refs_p
>>> conveniently start from the other
>>> end of the ref here.
>>
>> That said, for the motivational cases we either have one ref having
>> more dimensions than the other (the __real vs. full complex access) or
>> they have the same number of dimensions (and no access fn for the
>> base).
>>
>> For the first case we should simply "drop" access_fns of the larger
>> dimensional ref (from the start, plus outer component refs) up to the
>> point the number of dimensions are equal.
>
> Yeah, that's what happens for your example.  But if we had:
>
>     a[i].s.c.d
>     __real b[i].s.b[i].j
>
> (where d is the same type as the real component) then the access
> functions would be:
>
>                    0   1   2   3
>   a:              .d  .c  .s [i]
>
>            0   1   2   3   4   5
>   b:  __real  .j [i]  .b  .s [i]
>
> Comparing the a0/b2 column doesn't make sense, because one's an array
> and the other is a structure.  In this case the sequence we care about is:
>
>   a: [1, 3]
>   b: [3, 5]
>
> which is what the loop gives.  The a1/b3 column is the one that proves
> there's no dependence.
>
>> Then we have the case of
>>
>>   ! types_compatible_p (TREE_TYPE (base_a), TREE_TYPE (base_b))
>>
>> where we have to punt.
>>
>> Then we have the case of
>>
>>   ! operand_equal_p (base_a, base_b, OEP_ADDRESS_OF)
>>
>> which is where the new code should kick in to see if we can drop access_fns
>> from the other end (as unanalyzable but either having distance zero or not
>> aliased because of TBAA).
>>
>> At least your testcases suggest you do not want to handle
>>
>>  struct s { int x[N]; };
>>  struct r { struct s s; };
>>  f (struct s *a, struct r *b)
>>  {
>>     for (i = 0; i < N; ++i)
>>       a->s.x[i] = b->x[i];
>>  }
>>
>> ?
>>
>> With this example your loop which seems to search for a "common"
>> sequence in (different) midst of the reference trees makes more sense
>> (still that loop is awkward to understand).
>
> Yeah, I want to handle that too, just hadn't thought of it as a specific
> testcase.  The code does give the expected dependence distance of 0.

Ok.

I think the patch is reasonable, maybe the loop can be restructured / simplified
a bit and handling of the union case for example be done first (by looking at
DR_BASE_OBJECT).

Thanks,
Richard.

>
> Thanks,
> Richard

Reply via email to