Re: [PATCH 4/4]middle-end: create the longest possible zero extend chain after overwidening

Richard Biener Tue, 15 Oct 2024 05:42:21 -0700

On Mon, 14 Oct 2024, Tamar Christina wrote:

> Hi All,
> 
> Consider loops such as:
> 
> void test9(unsigned char *x, long long *y, int n, unsigned char k) {
>     for(int i = 0; i < n; i++) {
>         y[i] = k + x[i];
>     }
> }
> 
> where today we generate:
> 
> .L5:
>         ldr     q29, [x5], 16
>         add     x4, x4, 128
>         uaddl   v1.8h, v29.8b, v30.8b
>         uaddl2  v29.8h, v29.16b, v30.16b
>         zip1    v2.8h, v1.8h, v31.8h
>         zip1    v0.8h, v29.8h, v31.8h
>         zip2    v1.8h, v1.8h, v31.8h
>         zip2    v29.8h, v29.8h, v31.8h
>         sxtl    v25.2d, v2.2s
>         sxtl    v28.2d, v0.2s
>         sxtl    v27.2d, v1.2s
>         sxtl    v26.2d, v29.2s
>         sxtl2   v2.2d, v2.4s
>         sxtl2   v0.2d, v0.4s
>         sxtl2   v1.2d, v1.4s
>         sxtl2   v29.2d, v29.4s
>         stp     q25, q2, [x4, -128]
>         stp     q27, q1, [x4, -96]
>         stp     q28, q0, [x4, -64]
>         stp     q26, q29, [x4, -32]
>         cmp     x5, x6
>         bne     .L5
> 
> Note how the zero extend from short to long is half way the chain transformed
> into a sign extend.  There are two problems with this:
> 
>   1. sign extends are typically slower than zero extends on many uArches.
>   2. it prevents vectorizable_conversion from attempting to do a single step
>      promotion.
> 
> These sign extend happen due to the varous range reduction optimizations and
> patterns we have, such as multiplication widening, etc.
> 
> My first attempt to fix this was just updating the patterns to when the 
> original
> source is a zero extend, to not add the intermediate sign extend.
> 
> However this behavior happens in many other places, some of it and as new
> patterns get added the problem can be re-introduced.
> 
> Instead I have added a new pattern vect_recog_zero_extend_chain_pattern that
> attempts to simplify and extend an existing zero extend over multiple
> conversions statements.
> 
> As an example, T3 a = (T3)(signed T2)(unsigned T1)x where bitsize T3 > T2 > T1
> gets transformed into T3 a = (T3)(signed T2)(unsigned T2)x.
> 
> The final cast to signed it kept so the types in the tree still match. It will
> be correctly elided later on.
> 
> This represenation is the most optimal as vectorizable_conversion is already
> able to decompose a long promotion into multiple steps if the target does not
> support it in a single step.  More importantly it allows us to do proper 
> costing
> and support such conversions like (double)x, where bitsize(x) < int in an
> efficient manner.
> 
> To do this I have used Ranger's on-demand analysis to perform the check to see
> if an extension can be removed and extended to zero extend.  The reason for 
> this
> is that the vectorizer introduces several patterns that are not in the IL,  
> but
> also lots of widening IFNs for which handling in a switch wouldn't be very
> future proof.
> 
> I did try to do it without Ranger, but ranger had two benefits:
> 
> 1.  It simplified the handling of the IL changes the vectorizer introduces, 
> and
>     makes it future proof.
> 2.  Ranger has the advantage of doing the transformation in cases where it 
> knows
>     that the top bits of the value is zero.  Which we wouldn't be able to tell
>     by looking purely at statements.
> 3.  Ranger simplified the handling of corner cases.  Without it the handling 
> was
>     quite complex and I wasn't very confident in it's correctness.
> 
> So I think ranger is the right way to go here...  With these changes the above
> now generates:
> 
> .L5:
>         add     x4, x4, 128
>         ldr     q26, [x5], 16
>         uaddl   v2.8h, v26.8b, v31.8b
>         uaddl2  v26.8h, v26.16b, v31.16b
>         tbl     v4.16b, {v2.16b}, v30.16b
>         tbl     v3.16b, {v2.16b}, v29.16b
>         tbl     v24.16b, {v2.16b}, v28.16b
>         tbl     v1.16b, {v26.16b}, v30.16b
>         tbl     v0.16b, {v26.16b}, v29.16b
>         tbl     v25.16b, {v26.16b}, v28.16b
>         tbl     v2.16b, {v2.16b}, v27.16b
>         tbl     v26.16b, {v26.16b}, v27.16b
>         stp     q4, q3, [x4, -128]
>         stp     q1, q0, [x4, -64]
>         stp     q24, q2, [x4, -96]
>         stp     q25, q26, [x4, -32]
>         cmp     x5, x6
>         bne     .L5
> 
> I have also seen similar improvements in codegen on Arm and x86_64, especially
> with AVX512.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf,
> x86_64-pc-linux-gnu -m32, -m64 and no issues.
> 
> Hopefully Ok for master?


Hohumm.  So I looked at one of the examples and I don't see any
sign-extends in the IL we vectorize.  So your pattern is about
changing int -> double to unsigned int -> double but only so
a required intermediate int -> long conversion is done as
zero-extend?  IMO this doesn't belong to patterns but to
vectorizable_conversion, specifically the step determining the
intermediate types.

I don't quite understand what scalar pattern IL you feed to
the vectorizer in the end, few comments - also to this effect,
below.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>       * tree-vect-patterns.cc (vect_recog_zero_extend_chain_pattern): New.
> 
> gcc/testsuite/ChangeLog:
> 
>       * gcc.dg/vect/bb-slp-pattern-1.c: Update tests.
>       * gcc.dg/vect/slp-widen-mult-half.c: Likewise.
>       * gcc.dg/vect/vect-over-widen-10.c: Likewise.
>       * gcc.dg/vect/vect-over-widen-12.c: Likewise.
>       * gcc.dg/vect/vect-over-widen-14.c: Likewise.
>       * gcc.dg/vect/vect-over-widen-16.c: Likewise.
>       * gcc.dg/vect/vect-over-widen-6.c: Likewise.
>       * gcc.dg/vect/vect-over-widen-8.c: Likewise.
>       * gcc.dg/vect/vect-widen-mult-u16.c: Likewise.
>       * gcc.dg/vect/vect-widen-mult-u8-s16-s32.c: Likewise.
>       * lib/target-supports.exp
>       (check_effective_target_vect_widen_mult_hi_to_si_pattern,
>       check_effective_target_vect_widen_mult_si_to_di_pattern): Enable
>       AArch64.
>       * gcc.target/aarch64/vect-tbl-zero-extend_2.c: New test.
> 
> ---
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-1.c 
> b/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-1.c
> index 
> 5ae99225273ca5f915f60ecba3a5aaedebe46e96..627de78af4e48581575beda97bf2a0708ac091cb
>  100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-1.c
> @@ -52,4 +52,4 @@ int main (void)
>  
>  /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "slp2" 
> { target { vect_widen_mult_hi_to_si || vect_unpack } } } } */
>  /* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: 
> detected" 8 "slp2" { target vect_widen_mult_hi_to_si_pattern } } } */
> -/* { dg-final { scan-tree-dump-times "pattern recognized" 8 "slp2" { target 
> vect_widen_mult_hi_to_si_pattern } } } */
> +/* { dg-final { scan-tree-dump-times "widen_mult pattern recognized" 8 
> "slp2" { target vect_widen_mult_hi_to_si_pattern } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c 
> b/gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c
> index 
> b69ade338862cda4f44f5206d195eef1cb5e8d36..aecc085a51c93e0e7bed122df0a77a0a099ad6ef
>  100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c
> @@ -52,5 +52,5 @@ int main (void)
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
> vect_widen_mult_hi_to_si } } } */
>  /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" 
> { target vect_widen_mult_hi_to_si } } } */
>  /* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: 
> detected" 2 "vect" { target vect_widen_mult_hi_to_si_pattern } } } */
> -/* { dg-final { scan-tree-dump-times "pattern recognized" 2 "vect" { target 
> vect_widen_mult_hi_to_si_pattern } } } */
> +/* { dg-final { scan-tree-dump-times "pattern recognized" 4 "vect" { target 
> vect_widen_mult_hi_to_si_pattern } } } */
>  
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-10.c 
> b/gcc/testsuite/gcc.dg/vect/vect-over-widen-10.c
> index 
> f0140e4ef6d70cd61aa7dbb3ba39b1da142a79b2..bd798fae7e8136975d488206cfef9e39fac2bfea
>  100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-10.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-10.c
> @@ -11,7 +11,7 @@
>  
>  #include "vect-over-widen-9.c"
>  
> -/* { dg-final { scan-tree-dump {Splitting statement} "vect" } } */
> +/* { dg-final { scan-tree-dump {Splitting pattern statement} "vect" } } */
>  /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: 
> detected:[^\n]* \+ } "vect" } } */
>  /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: 
> detected:[^\n]* >> 1} "vect" } } */
>  /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: 
> detected:[^\n]* >> 2} "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-12.c 
> b/gcc/testsuite/gcc.dg/vect/vect-over-widen-12.c
> index 
> ddb3bd8c0d378f0138c8cc7f9c6ea3300744b8a8..8c0544e35c29de60e76759f4ed13206278c72925
>  100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-12.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-12.c
> @@ -11,7 +11,7 @@
>  
>  #include "vect-over-widen-11.c"
>  
> -/* { dg-final { scan-tree-dump {Splitting statement} "vect" } } */
> +/* { dg-final { scan-tree-dump {Splitting pattern statement} "vect" } } */
>  /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: 
> detected:[^\n]* \+ } "vect" } } */
>  /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: 
> detected:[^\n]* >> 1} "vect" } } */
>  /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: 
> detected:[^\n]* >> 2} "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-14.c 
> b/gcc/testsuite/gcc.dg/vect/vect-over-widen-14.c
> index 
> dfa09f5d2cafe329e6d57b5cc681786cc2c7d215..1fe0305c1c4f61d05864ef97789726a1dc6ec8b1
>  100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-14.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-14.c
> @@ -11,7 +11,7 @@
>  
>  #include "vect-over-widen-13.c"
>  
> -/* { dg-final { scan-tree-dump {Splitting statement} "vect" } } */
> +/* { dg-final { scan-tree-dump {Splitting pattern statement} "vect" } } */
>  /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: 
> detected:[^\n]* \+} "vect" } } */
>  /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: 
> detected:[^\n]* >> 1} "vect" } } */
>  /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: 
> detected:[^\n]* = \(unsigned char\)} "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-16.c 
> b/gcc/testsuite/gcc.dg/vect/vect-over-widen-16.c
> index 
> 4584c586da1e6f13e8c8de4c1291cea0141ebab5..4ecdadf7a035a4f83b1767a063a1b0f47bdd543d
>  100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-16.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-16.c
> @@ -11,7 +11,7 @@
>  
>  #include "vect-over-widen-15.c"
>  
> -/* { dg-final { scan-tree-dump {Splitting statement} "vect" } } */
> +/* { dg-final { scan-tree-dump {Splitting pattern statement} "vect" } } */
>  /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: 
> detected:[^\n]* \+} "vect" } } */
>  /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: 
> detected:[^\n]* >> 1} "vect" } } */
>  /* { dg-final { scan-tree-dump-not {vect_recog_cast_forwprop_pattern: 
> detected} "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c 
> b/gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c
> index 
> bda92c965e080dd3f48ec42b6bea16e79d9416cd..6b8c3dfa2c89ce04d7673607ef2d2f14a14eb32f
>  100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c
> @@ -9,7 +9,7 @@
>  
>  #include "vect-over-widen-5.c"
>  
> -/* { dg-final { scan-tree-dump {Splitting statement} "vect" } } */
> +/* { dg-final { scan-tree-dump {Splitting pattern statement} "vect" } } */
>  /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: 
> detected:[^\n]* \+ } "vect" } } */
>  /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: 
> detected:[^\n]* >> 1} "vect" } } */
>  /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: 
> detected:[^\n]* \(unsigned char\)} "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c 
> b/gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c
> index 
> 553c0712a79a1d19195dbdab7cbd6fa330685bea..1cf725ff4b7f151097192db1a0b65173c4c83b19
>  100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c
> @@ -12,7 +12,7 @@
>  
>  #include "vect-over-widen-7.c"
>  
> -/* { dg-final { scan-tree-dump {Splitting statement} "vect" } } */
> +/* { dg-final { scan-tree-dump {Splitting pattern statement} "vect" } } */
>  /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: 
> detected:[^\n]* \+ } "vect" } } */
>  /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: 
> detected:[^\n]* >> 2} "vect" } } */
>  /* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: 
> detected:[^\n]* \(unsigned char\)} "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u16.c 
> b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u16.c
> index 
> 258d253f401459d448d1ae86f56b0c97815d5b61..b5018f855a72534b4d64d2dc2b7ab2ac0deb674b
>  100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u16.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u16.c
> @@ -47,5 +47,5 @@ int main (void)
>  
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
> { vect_widen_mult_hi_to_si || vect_unpack } } } } */
>  /* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: 
> detected" 1 "vect" { target vect_widen_mult_hi_to_si_pattern } } } */
> -/* { dg-final { scan-tree-dump-times "pattern recognized" 1 "vect" { target 
> vect_widen_mult_hi_to_si_pattern } } } */
> +/* { dg-final { scan-tree-dump-times "widen_mult pattern recognized" 1 
> "vect" { target vect_widen_mult_hi_to_si_pattern } } } */
>  
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-s16-s32.c 
> b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-s16-s32.c
> index 
> 3baafca7b548124ae5c48fdf3c2f07c319155967..ab523ca77652e1f1533889fda9c0eb31c987ffe9
>  100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-s16-s32.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-s16-s32.c
> @@ -47,5 +47,5 @@ int main (void)
>  
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
> { vect_widen_mult_hi_to_si || vect_unpack } } } } */
>  /* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: 
> detected" 1 "vect" { target vect_widen_mult_hi_to_si_pattern } } } */
> -/* { dg-final { scan-tree-dump-times "pattern recognized" 1 "vect" { target 
> vect_widen_mult_hi_to_si_pattern } } } */
> +/* { dg-final { scan-tree-dump-times "widen_mult pattern recognized" 1 
> "vect" { target vect_widen_mult_hi_to_si_pattern } } } */
>  
> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-tbl-zero-extend_2.c 
> b/gcc/testsuite/gcc.target/aarch64/vect-tbl-zero-extend_2.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..1577eacd9dbbb52274d9f86c77406555b7726482
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vect-tbl-zero-extend_2.c
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O3 -std=c99 -march=armv8-a" } */
> +
> +void test6(unsigned char *x, double *y, int n) {
> +    for(int i = 0; i < (n & -8); i++) {
> +        y[i] += x[i];
> +    }
> +}
> +
> +void test7(unsigned char *x, double *y, int n, unsigned char k) {
> +    for(int i = 0; i < (n & -8); i++) {
> +        y[i] += k * x[i];
> +    }
> +}
> +
> +void test8(unsigned char *x, double *y, int n, unsigned char k) {
> +    for(int i = 0; i < (n & -8); i++) {
> +        y[i] = k + x[i];
> +    }
> +}
> +
> +void test9(unsigned char *x, long long *y, int n, unsigned char k) {
> +    for(int i = 0; i < (n & -8); i++) {
> +        y[i] = k + x[i];
> +    }
> +}
> +
> +/* { dg-final { scan-assembler-times {\tuxtl} 1 } } */
> +/* { dg-final { scan-assembler-not {\tuxtl2} } } */
> +/* { dg-final { scan-assembler-not {\tzip1} } } */
> +/* { dg-final { scan-assembler-not {\tzip2} } } */
> +/* { dg-final { scan-assembler-times {\ttbl} 44 } } */
> +/* { dg-final { scan-assembler-times {\.LC[0-9]+:} 12 } } */
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index 
> d113a08dff7b2a8ab5bdfe24386d271bff255afc..feae1b8fcf8cd7ab56a8c76c0cd3034c0a828724
>  100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -8240,6 +8240,7 @@ proc 
> check_effective_target_vect_widen_mult_hi_to_si_pattern { } {
>      return [check_cached_effective_target_indexed 
> vect_widen_mult_hi_to_si_pattern {
>        expr { [istarget powerpc*-*-*]
>            || [istarget ia64-*-*]
> +          || [istarget aarch64*-*-*]
>            || [istarget loongarch*-*-*]
>            || [istarget i?86-*-*] || [istarget x86_64-*-*]
>            || ([is-effective-target arm_neon]
> @@ -8259,6 +8260,7 @@ proc 
> check_effective_target_vect_widen_mult_si_to_di_pattern { } {
>        expr { [istarget ia64-*-*]
>            || [istarget i?86-*-*] || [istarget x86_64-*-*]
>            || [istarget loongarch*-*-*]
> +          || [istarget aarch64*-*-*]
>            || ([istarget s390*-*-*]
>                && [check_effective_target_s390_vx]) }}]
>  }
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 
> 9bf8526ac995c6c2678b25f5df4316aec41333e0..74c7269a3ab15cba1ee2ef0556d25afda851f7f0
>  100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -5524,6 +5524,122 @@ vect_recog_mixed_size_cond_pattern (vec_info *vinfo,
>    return pattern_stmt;
>  }
>  
> +/* Function vect_recog_zero_extend_chain_pattern
> +
> +   Try to find the following pattern:
> +
> +     type x_t;
> +     TYPE a_T, b_T, c_T;

But a_T, b_T and c_T are different types - what types?

> +   loop:
> +     S1  a_T = (b_T)(c_T)x_t;
> +
> +   where type 'TYPE' is an integral type which has different size
> +   from 'type' and c_T is a zero extend or a sign extend on a value whose top
> +   bit is known to be zero. a_T can be signed or unsigned.
> +
> +   Input:
> +
> +   * STMT_VINFO: The stmt from which the pattern search begins.
> +
> +   Output:
> +
> +   * TYPE_OUT: The type of the output of this pattern.
> +
> +   * Return value: A new stmt that will be used to replace the pattern.
> +     This replaces multiple chained extensions with the longest possible
> +     chain or zero extends and a final convert to the required sign.
> +     
> +     S1  a_T = (a_T)(unsigned a_T)x_t;  */

So for (double)(int)x you produce (double)(int)(unsigned int)x?  It
feels like the above example misses a conversion?

I don't think the pure integer type sequence happens - we do
(and should if not) already perform canonicalization of equivalent
conversion sequences, otherwise you'd see a missed CSE (I couldn't
produce such an example).

You say you found the "issue" to be exposed in several (or found in one,
suspected in several) existing patterns.  Can you elaborate and point
out where this happens?  I don't think it covers the int -> double
case, does it?

Thanks,
Richard.

> +
> +static gimple *
> +vect_recog_zero_extend_chain_pattern (vec_info *vinfo,
> +                                   stmt_vec_info stmt_vinfo, tree *type_out)
> +{
> +  gimple *last_stmt = STMT_VINFO_STMT (vect_stmt_to_vectorize (stmt_vinfo));
> +
> +  if (!is_gimple_assign (last_stmt))
> +    return NULL;
> +
> +  tree_code code = gimple_assign_rhs_code (last_stmt);
> +  tree lhs = gimple_assign_lhs (last_stmt);
> +  tree rhs = gimple_assign_rhs1 (last_stmt);
> +  tree lhs_type = TREE_TYPE (lhs);
> +  tree rhs_type = TREE_TYPE (rhs);
> +
> +  if ((code != FLOAT_EXPR && code != NOP_EXPR)
> +      || TYPE_UNSIGNED (lhs_type)
> +      || TREE_CODE (rhs_type) != INTEGER_TYPE
> +      || TREE_CODE (rhs) != SSA_NAME
> +      || STMT_VINFO_DEF_TYPE (stmt_vinfo) != vect_internal_def)
> +    return NULL;
> +
> +  /* Check to see if it's safe to extend the zero extend to the new type.
> +     In general this is safe if the rhs1 type is unsigned or if we know that
> +     the top bits are zero,  this can happen due to all the widening 
> operations
> +     we have.   For instance a widening addition will have top bits zero.  */
> +  if (!TYPE_UNSIGNED (rhs_type))
> +    {
> +      wide_int wcst = get_nonzero_bits (rhs);
> +      if (wi::neg_p (wcst) || wi::clz (wcst) == 0)
> +     return NULL;
> +    }
> +
> +  tree cvt_type = unsigned_type_for (lhs_type);
> +
> +  tree cvt_vectype = get_vectype_for_scalar_type (vinfo, cvt_type);
> +  if (!cvt_vectype || !VECTOR_TYPE_P (cvt_vectype))
> +    return NULL;
> +
> +  tree out_vectype = get_vectype_for_scalar_type (vinfo, lhs_type);
> +  if (!out_vectype || !VECTOR_TYPE_P (out_vectype))
> +    return NULL;
> +
> +  stmt_vec_info irhs;
> +
> +  gimple_ranger ranger;
> +
> +  /* Dig through any existing conversions to see if we can extend the zero
> +     extend chain across multiple converts.  */
> +  while ((irhs = vect_get_internal_def (vinfo, rhs)))
> +    {
> +      gimple *g_irhs = STMT_VINFO_STMT (irhs);
> +      if (!is_gimple_assign (g_irhs)
> +       || gimple_assign_rhs_code (g_irhs) != NOP_EXPR)
> +     break;
> +
> +      /* See if we can consume the next conversion as well.  To do this it's
> +      best to use Ranger as it can see through the intermediate IL that the
> +      vectorizer creates throughout pattern matching.  */
> +      int_range_max r;
> +      ranger.range_of_stmt (r, g_irhs);
> +      wide_int nz = r.get_nonzero_bits ();
> +      if (wi::neg_p (nz) || wi::clz (nz) == 0)
> +     break;
> +
> +      rhs = gimple_assign_rhs1 (g_irhs);
> +    }
> +
> +  /* If the result is a no-op, or we've jumped over a truncate of sort, or if
> +     nothing would change materially just leave it alone.  */
> +  if (TYPE_PRECISION (lhs_type) <= TYPE_PRECISION (TREE_TYPE (rhs))
> +      || (code == FLOAT_EXPR && rhs == gimple_assign_rhs1 (last_stmt)))
> +    return NULL;
> +
> +  vect_pattern_detected ("vect_recog_zero_extend_chain_pattern", last_stmt);
> +
> +  tree cast_var = vect_recog_temp_ssa_var (cvt_type, NULL);
> +  gimple *pattern_stmt = NULL;
> +  pattern_stmt = gimple_build_assign (cast_var, NOP_EXPR, rhs);
> +  append_pattern_def_seq (vinfo, stmt_vinfo, pattern_stmt, cvt_vectype);
> +
> +  tree cvt_var = vect_recog_temp_ssa_var (lhs_type, NULL);
> +  pattern_stmt = gimple_build_assign (cvt_var, code, cast_var);
> +
> +  *type_out = out_vectype;
> +
> +  return pattern_stmt;
> +}
> +
>  
>  /* Helper function of vect_recog_bool_pattern.  Called recursively, return
>     true if bool VAR can and should be optimized that way.  Assume it 
> shouldn't
> @@ -7509,6 +7625,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
>    { vect_recog_widen_minus_pattern, "widen_minus" },
>    { vect_recog_widen_abd_pattern, "widen_abd" },
>    /* These must come after the double widening ones.  */
> +  { vect_recog_zero_extend_chain_pattern, "zero_extend_chain" },
>  };
>  
>  /* Mark statements that are involved in a pattern.  */
> 
> 
> 
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH 4/4]middle-end: create the longest possible zero extend chain after overwidening

Reply via email to