Re: [PATCH GCC][v2]Simplify alias check code generation in vectorizer

Richard Biener Wed, 21 Sep 2016 07:02:52 -0700

On Tue, Sep 20, 2016 at 5:25 PM, Bin Cheng <bin.ch...@arm.com> wrote:
> Hi,
> I originally posted a patch improving code generation for alias check in 
> vectorizer at https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00929.html.  Here 
> it's the 2nd version (better) patch.  It detects data reference pair in which 
> the two references are only different to each other wrto index.  In this case 
> the patch generates intersect range checks based on index of address of 
> reference, rather than the address of reference.  Take example from patch's 
> comment, given below two data references:
>
>                        DR_A                           DR_B
>       data-ref         arr[i]                         arr[j]
>       base_object      arr                            arr
>       index            {i_0, +, 1}_loop               {j_0, +, 1}_loop
>
> The addresses and their index can be depicted as:
>
>         |<- ADDR_A    ->|          |<- ADDR_B    ->|
>      ------------------------------------------------------->
>         |   |   |   |   |          |   |   |   |   |
>      ------------------------------------------------------->
>         i_0 ...         i_0+4      j_0 ...         j_0+4
>
> We can create expression based on index rather than address:  (i_0 + 4 < j_0 
> || j_0 + 4 < i_0).
>
> Also take example from spec2k/200.sixtrack/DACOP, gcc needs to generate alias 
> check for three pairs:
> //pair 1
> dr_a:
> (Data Ref:
>   bb: 8
>   stmt: _92 = da.cc[_27];
>   ref: da.cc[_27];
> )
> dr_b:
> (Data Ref:
>   bb: 8
>   stmt: da.cc[_93] = _92;
>   ref: da.cc[_93];
> )
> //pair 2
> dr_a:
> (Data Ref:
>   bb: 8
>   stmt: pretmp_29 = da.i2[_27];
>   ref: da.i2[_27];
> )
> dr_b:
> (Data Ref:
>   bb: 8
>   stmt: da.i2[_93] = pretmp_29;
>   ref: da.i2[_93];
> )
> //pair 3
> dr_a:
> (Data Ref:
>   bb: 8
>   stmt: pretmp_28 = da.i1[_27];
>   ref: da.i1[_27];
> )
> dr_b:
> (Data Ref:
>   bb: 8
>   stmt: da.i1[_93] = pretmp_28;
>   ref: da.i1[_93];
> )
>
> With this patch, code can be improved to:
>
>   <bb 7>:
>   _37 = (unsigned int) _6;
>   _106 = (unsigned int) _52;
>   _107 = _37 - _106;
>   _108 = _107 > 7;
>   _109 = (integer(kind=8)) _2;
>   _110 = _109 + 3;
>   _111 = (integer(kind=8)) _52;
>   _112 = _111 + -1;
>   _113 = _110 < _112;
>   _114 = (integer(kind=8)) _52;
>   _115 = _114 + 2;
>   _116 = (integer(kind=8)) _2;
>   _117 = _115 < _116;
>   _118 = _113 | _117;
>   _119 = _108 & _118;
>   _120 = (integer(kind=8)) _2;
>   _121 = _120 + 3;
>   _122 = (integer(kind=8)) _52;
>   _123 = _122 + -1;
>   _124 = _121 < _123;
>   _125 = (integer(kind=8)) _52;
>   _126 = _125 + 2;
>   _127 = (integer(kind=8)) _2;
>   _128 = _126 < _127;
>   _129 = _124 | _128;
>   _130 = _119 & _129;
>   _131 = (integer(kind=8)) _2;
>   _132 = _131 + 3;
>   _133 = (integer(kind=8)) _52;
>   _134 = _133 + -1;
>   _135 = _132 < _134;
>   _136 = (integer(kind=8)) _52;
>   _137 = _136 + 2;
>   _138 = (integer(kind=8)) _2;
>   _139 = _137 < _138;
>   _140 = _135 | _139;
>   _141 = _130 & _140;
>   if (_141 != 0)
>     goto <bb 8>;
>   else
>     goto <bb 20>;
>
> Note this patch doesn't do local CSE, and common sub-expressions will be 
> optimized by later passes.  I think this is OK because the redundancy is far 
> away from loops thus won't have bad effect (there is an opposite example/PR).
> Bootstrap and test on x86_64 and AArch64, is it OK?


Seeing

+  /* Infer index length from segment length.  */
+  unsigned HOST_WIDE_INT idx_len1 = (seg_len1 + abs_step - 1) / abs_step;
+  unsigned HOST_WIDE_INT idx_len2 = (seg_len2 + abs_step - 1) / abs_step;

Does this really work for DR_STEP that is bigger than 1 in index space?  The
segment length is generally vectorization-factor * DR_STEP but index space
is -- if you are talking about array-ref indexes -- the segment length divided
by the array element size.  Note that what the index space refers to for
a given DR_ACCESS_FN depends on whether this is from an ARRAY_REF
(element size) or the base pointer evolution (bytes).  I suppose looking at
the access functions type might distinguish the pointer vs. array index case.

What I'm looking for is a big comment on why the above "translation" is
correct.

Thanks,
Richard.

> Thanks,
> bin
>
> 2016-09-19  Bin Cheng  <bin.ch...@arm.com>
>
>         * tree-vect-loop-manip.c (create_intersect_range_checks_index): New.
>         (create_intersect_range_checks): New.
>         (vect_create_cond_for_alias_checks): Call above function.

Re: [PATCH GCC][v2]Simplify alias check code generation in vectorizer

Reply via email to