On Tue, 11 Feb 2025, Tamar Christina wrote:

> Hi All,
> 
> This fixes two PRs on Early break vectorization by delaying the safety checks 
> to
> vectorizable_load when the VF, VMAT and vectype are all known.
> 
> This patch does add two new restrictions:
> 
> 1. On LOAD_LANES targets, where the buffer size is known, we reject uneven
>    group sizes, as they are unaligned every n % 2 iterations and so may cross
>    a page unwittingly.
> 
> 2. On LOAD_LANES targets when the buffer is unknown, we reject vectorization 
> if
>    we cannot peel for alignment, as the alignment requirement is quite large 
> at
>    GROUP_SIZE * vectype_size.  This is unlikely to ever be beneficial so we
>    don't support it for now.
> 
> There are other steps documented inside the code itself so that the reasoning
> is next to the code.
> 
> Note that for VLA I have still left this fully disabled when not working on a
> fixed buffer.
> 
> For VLA targets like SVE return element alignment as the desired vector
> alignment.  This means that the loads are never misaligned and so annoying it
> won't ever need to peel.
> 
> So what I think needs to happen in GCC 16 is that.
> 
> 1. during vect_compute_data_ref_alignment we need to take the max of
>    POLY_VALUE_MIN and vector_alignment.
> 
> 2. vect_do_peeling define skip_vector when PFA for VLA, and in the guard add a
>    check that ncopies * vectype does not exceed POLY_VALUE_MAX which we use 
> as a
>    proxy for pagesize.
> 
> 3. Force LOOP_VINFO_USING_PARTIAL_VECTORS_P to be true in
>    vect_determine_partial_vectors_and_peeling since the first iteration has to
>    be partial. If LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P we have to fail to
>    vectorize.
> 
> 4. Create a default mask to be used, so that 
> vect_use_loop_mask_for_alignment_p
>    becomes true and we generate the peeled check through loop control for
>    partial loops.  From what I can tell this won't work for
>    LOOP_VINFO_FULLY_WITH_LENGTH_P since they don't have any peeling support at
>    all in the compiler.  That would need to be done independently from the
>    above.

We basically need to implement peeling/versioning for alignment based
on the actual POLY value with the fallback being first-fault loads.

> In any case, not GCC 15 material so I've kept the WIP patches I have 
> downstream.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> -m32, -m64 and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>       PR tree-optimization/118464
>       PR tree-optimization/116855
>       * doc/invoke.texi (min-pagesize): Update docs with vectorizer use.
>       * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay
>       checks.
>       (vect_compute_data_ref_alignment): Remove alignment checks and move to
>       get_load_store_type, increase group access alignment.
>       (vect_enhance_data_refs_alignment): Add note to comment needing
>       investigating.
>       (vect_analyze_data_refs_alignment): Likewise.
>       (vect_supportable_dr_alignment): For group loads look at first DR.
>       * tree-vect-stmts.cc (get_load_store_type):
>       Perform safety checks for early break pfa.
>       * tree-vectorizer.h (dr_peeling_alignment,
>       dr_set_peeling_alignment): New.
> 
> gcc/testsuite/ChangeLog:
> 
>       PR tree-optimization/118464
>       PR tree-optimization/116855
>       * gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the
>       load type is relaxed later.
>       * gcc.dg/vect/vect-early-break_121-pr114081.c: Update.
>       * gcc.dg/vect/vect-early-break_22.c: Reject for load_lanes targets
>       * g++.dg/vect/vect-early-break_7-pr118464.cc: New test.
>       * gcc.dg/vect/vect-early-break_132-pr118464.c: New test.
>       * gcc.dg/vect/vect-early-break_133_pfa1.c: New test.
>       * gcc.dg/vect/vect-early-break_133_pfa10.c: New test.
>       * gcc.dg/vect/vect-early-break_133_pfa2.c: New test.
>       * gcc.dg/vect/vect-early-break_133_pfa3.c: New test.
>       * gcc.dg/vect/vect-early-break_133_pfa4.c: New test.
>       * gcc.dg/vect/vect-early-break_133_pfa5.c: New test.
>       * gcc.dg/vect/vect-early-break_133_pfa6.c: New test.
>       * gcc.dg/vect/vect-early-break_133_pfa7.c: New test.
>       * gcc.dg/vect/vect-early-break_133_pfa8.c: New test.
>       * gcc.dg/vect/vect-early-break_133_pfa9.c: New test.
>       * g++.dg/ext/pragma-unroll-lambda-lto.C: Add pragma novector.
>       * gcc.dg/tree-ssa/gen-vect-2.c: Likewise.
>       * gcc.dg/tree-ssa/gen-vect-25.c: Likewise.
>       * gcc.dg/tree-ssa/gen-vect-32.c: Likewise.
>       * gcc.dg/tree-ssa/ivopt_mult_2g.c: Likewise.
>       * gcc.dg/tree-ssa/ivopts-5.c: Likewise.
>       * gcc.dg/tree-ssa/ivopts-6.c: Likewise.
>       * gcc.dg/tree-ssa/ivopts-7.c: Likewise.
>       * gcc.dg/tree-ssa/ivopts-8.c: Likewise.
>       * gcc.dg/tree-ssa/ivopts-9.c: Likewise.
>       * gcc.dg/tree-ssa/predcom-dse-1.c: Likewise.
>       * gcc.dg/tree-ssa/predcom-dse-10.c: Likewise.
>       * gcc.dg/tree-ssa/predcom-dse-11.c: Likewise.
>       * gcc.dg/tree-ssa/predcom-dse-12.c: Likewise.
>       * gcc.dg/tree-ssa/predcom-dse-2.c: Likewise.
>       * gcc.dg/tree-ssa/predcom-dse-3.c: Likewise.
>       * gcc.dg/tree-ssa/predcom-dse-4.c: Likewise.
>       * gcc.dg/tree-ssa/predcom-dse-5.c: Likewise.
>       * gcc.dg/tree-ssa/predcom-dse-6.c: Likewise.
>       * gcc.dg/tree-ssa/predcom-dse-7.c: Likewise.
>       * gcc.dg/tree-ssa/predcom-dse-8.c: Likewise.
>       * gcc.dg/tree-ssa/predcom-dse-9.c: Likewise.
>       * gcc.target/i386/pr90178.c: Likewise.
>       * gcc.dg/vect/vect-early-break_39.c: Update testcase for misalignment.
> 
> ---
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 
> 0aef2abf05b9b2f5996de69d5ebc3a21109ee6e1..db00f8b403814b58261849d8917863dc06bbf3e2
>  100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -17256,7 +17256,7 @@ Maximum number of relations the oracle will register 
> in a basic block.
>  Work bound when discovering transitive relations from existing relations.
>  
>  @item min-pagesize
> -Minimum page size for warning purposes.
> +Minimum page size for warning and early break vectorization purposes.
>  
>  @item openacc-kernels
>  Specify mode of OpenACC `kernels' constructs handling.
> diff --git a/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C 
> b/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
> index 
> 0db57c8d3a01985e1e76bb9f8a52613179060f19..5980bf316899553e16d078deee32911f31fafd94
>  100644
> --- a/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
> +++ b/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
> @@ -10,6 +10,7 @@ inline Iter
>  my_find(Iter first, Iter last, Pred pred)
>  {
>  #pragma GCC unroll 4
> +#pragma GCC novector
>      while (first != last && !pred(*first))
>          ++first;
>      return first;
> diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc 
> b/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..5e50e56ad17515e278c05c92263af120c3ab2c21
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-O3" } */
> +
> +#include <cstddef>
> +
> +struct ts1 {
> +  int spans[6][2];
> +};
> +struct gg {
> +  int t[6];
> +};
> +ts1 f(size_t t, struct ts1 *s1, struct gg *s2) {
> +  ts1 ret;
> +  for (size_t i = 0; i != t; i++) {
> +    if (!(i < t)) __builtin_abort();
> +    ret.spans[i][0] = s1->spans[i][0] + s2->t[i];
> +    ret.spans[i][1] = s1->spans[i][1] + s2->t[i];
> +  }
> +  return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c
> index 
> a35999a172ac762bb4873d10b331301750f4015b..00fc8f01991cc994737bc2088e72d85f249bf341
>  100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c
> @@ -29,6 +29,7 @@ int main ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (ca[i] != cb[i])
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-25.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-25.c
> index 
> 9f14a54c413757df7230b7b6053c83a8a5a1e6c9..99d5e6231ff053089782b52dc6ce9b9ccb8c64a0
>  100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-25.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-25.c
> @@ -27,6 +27,7 @@ int main_1 (int n, int *p)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (ia[i] != n)
> @@ -40,6 +41,7 @@ int main_1 (int n, int *p)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (ib[i] != k)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c
> index 
> 62d2b5049fd902047540b90a2ef79b789f903969..1202ec326c7e0020daf58af9544cdbe2b1da4914
>  100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c
> @@ -23,6 +23,7 @@ int main ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (s.ca[i] != 5)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c
> index 
> dd06e598f7f48e1a75eba41d626860404325259d..b79bd10585f501992c93648ea1a1f2d2699c07c1
>  100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
> -/* { dg-options "-O2 -fgimple -m64 -fdump-tree-ivopts-details" } */
> +/* { dg-options "-O2 -fgimple -m64 -fdump-tree-ivopts-details 
> -fno-tree-vectorize" } */
>  
>  /* Exit tests 'i < N1' and 'p2 > p_limit2' can be replaced, so
>   * two ivs i and p2 can be eliminate.  */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-5.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-5.c
> index 
> a6af497f4bf7f1ef6c64e09b87931225287d78e0..7b9615f07f3c4af3657eb7d0183c1a51de9fbc42
>  100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-5.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-5.c
> @@ -5,6 +5,7 @@ int*
>  foo (int* mem, int sz, int val)
>  {
>    int i;
> +#pragma GCC novector
>    for (i = 0; i < sz; i++)
>      if (mem[i] == val) 
>        return &mem[i];
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-6.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-6.c
> index 
> 8383154f99f2559873ef5b3a8fa8119cf679782f..08304293140a82e5484c8399b4374a474c66b34b
>  100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-6.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-6.c
> @@ -5,6 +5,7 @@ int*
>  foo (int* mem, int sz, int val)
>  {
>    int i;
> +#pragma GCC novector
>    for (i = 0; i != sz; i++)
>      if (mem[i] == val) 
>        return &mem[i];
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-7.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-7.c
> index 
> 44f5603d4f5b8da6c759e8732503638131b0fca8..03160f234f74319cda6d7450788da871ea0cea74
>  100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-7.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-7.c
> @@ -5,6 +5,7 @@ int*
>  foo (int* mem, int beg, int end, int val)
>  {
>    int i;
> +#pragma GCC novector
>    for (i = beg; i < end; i++)
>      if (mem[i] == val) 
>        return &mem[i];
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-8.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-8.c
> index 
> b2556eaac0d02f65a50bbd532a47fef9c0b1dfa8..a7fd3c9de3746c116dfb73419805fd7ce6e69ffa
>  100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-8.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-8.c
> @@ -5,6 +5,7 @@ int*
>  foo (int* mem, char sz, int val)
>  {
>    char i;
> +#pragma GCC novector
>    for (i = 0; i < sz; i++)
>      if (mem[i] == val) 
>        return &mem[i];
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-9.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-9.c
> index 
> d26d994f9bd28bc2346a6878d48b159729851ef6..fb9656b88d7bea8a9a84e2ca6ff877a2aac7e05b
>  100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-9.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-9.c
> @@ -5,6 +5,7 @@ int*
>  foo (int* mem, unsigned char sz, int val)
>  {
>    unsigned char i;
> +#pragma GCC novector
>    for (i = 0; i < sz; i++)
>      if (mem[i] == val) 
>        return &mem[i];
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c
> index 
> a0a04a08c61d48128ad5fd1a11daaf0abc783053..b660f9d258423356a4d73d5996a5f1a8ede9ead9
>  100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c
> @@ -32,6 +32,7 @@ void check (int *a, int *res, int len)
>  {
>    int i;
>  
> +#pragma GCC novector
>    for (i = 0; i < len; i++)
>      if (a[i] != res[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-10.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-10.c
> index 
> f770a8ad812aedee8f65b011134cda91cbe2bf91..8e5a3a434986a31bb635bf3bc1ecc36d463f2ee7
>  100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-10.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-10.c
> @@ -23,6 +23,7 @@ void check (int *a, int *res, int len)
>  {
>    int i;
>  
> +#pragma GCC novector
>    for (i = 0; i < len; i++)
>      if (a[i] != res[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-11.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-11.c
> index 
> ed2b96a0d1a4e0c90bf52a83b5f21e2fd1c5a5c5..fd56fd9747e3c572c93107188ede7482ad01bb99
>  100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-11.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-11.c
> @@ -29,6 +29,7 @@ void check (int *a, int *res, int len, int sum, int val)
>    if (sum != val)
>      abort ();
>  
> +#pragma GCC novector
>    for (i = 0; i < len; i++)
>      if (a[i] != res[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-12.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-12.c
> index 
> 2487c1c8205a4f09fd16974f3599ddc8c48b92cf..5eac905aff87e6c4aa4449c689d2594b240fec4e
>  100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-12.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-12.c
> @@ -37,6 +37,7 @@ void check (int *a, int *res, int len, int sval)
>    if (sum != sval)
>      abort ();
>  
> +#pragma GCC novector
>    for (i = 0; i < len; i++)
>      if (a[i] != res[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c
> index 
> 020ca705790d6ace707184c9d2804f3d690de916..801acad33e9d6b7eb17f0cde408903c4f2674acc
>  100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c
> @@ -32,6 +32,7 @@ void check (int *a, int *res, int len)
>  {
>    int i;
>  
> +#pragma GCC novector
>    for (i = 0; i < len; i++)
>      if (a[i] != res[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c
> index 
> 667cc333d9f2c030474e0b3115c0b86cda733c2e..8b82bdbc0c92cc579824393dc15f2f5a3e5f55e5
>  100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c
> @@ -40,6 +40,7 @@ void check (int *a, int *res, int len)
>  {
>    int i;
>  
> +#pragma GCC novector
>    for (i = 0; i < len; i++)
>      if (a[i] != res[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c
> index 
> 8118461af0b63d1f9b42879783ae2650a9d9b34a..0d64bc72f82341fd0518a6f59ad2a10aec7b0088
>  100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c
> @@ -31,6 +31,7 @@ void check (int *a, int *res, int len)
>  {
>    int i;
>  
> +#pragma GCC novector
>    for (i = 0; i < len; i++)
>      if (a[i] != res[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-5.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-5.c
> index 
> 03fa646661e2839946e80e0b27ea1d0ea0ef9aeb..7db3bca3b2df98f3c0b3db00be18fc8054644655
>  100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-5.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-5.c
> @@ -33,6 +33,7 @@ void check (int *a, int *res, int len)
>  {
>    int i;
>  
> +#pragma GCC novector
>    for (i = 0; i < len; i++)
>      if (a[i] != res[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-6.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-6.c
> index 
> ab2fd403d3005ba06d9992580945ce28f8fb1c09..1267bae5f1c44d60d484cca7d88a5714770f147f
>  100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-6.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-6.c
> @@ -35,6 +35,7 @@ void check (int *a, int *res, int len)
>  {
>    int i;
>  
> +#pragma GCC novector
>    for (i = 0; i < len; i++)
>      if (a[i] != res[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-7.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-7.c
> index 
> c746ebd715561eb9f7192a433c321f86e0751eaa..cfe44a06ce4ada6fddc3659ddf748a16904b5d9e
>  100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-7.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-7.c
> @@ -33,6 +33,7 @@ void check (int *a, int *res, int len)
>  {
>    int i;
>  
> +#pragma GCC novector
>    for (i = 0; i < len; i++)
>      if (a[i] != res[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-8.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-8.c
> index 
> 6c4e9afa487ed33e4ab5d887640e0efa44a72c6d..646e43d9aad2b235bdae0d9d52df89a3da2dd3e4
>  100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-8.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-8.c
> @@ -31,6 +31,7 @@ void check (int *a, int *res, int len)
>  {
>    int i;
>  
> +#pragma GCC novector
>    for (i = 0; i < len; i++)
>      if (a[i] != res[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-9.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-9.c
> index 
> 9c5e8ca9a793b0405e7f448798aa1fac483d2f05..30daf82fac5cef2e26e4597aa4eb10aa33cd0af2
>  100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-9.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-9.c
> @@ -69,6 +69,7 @@ void check (int *a, int *res, int len)
>  {
>    int i;
>  
> +#pragma GCC novector
>    for (i = 0; i < len; i++)
>      if (a[i] != res[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c 
> b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> index 
> 9ef1330b47c817e16baaafa44c2b15108b9dd3a9..4c8255895b976653228233d93c950629f3231554
>  100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> @@ -55,7 +55,9 @@ int main()
>         }
>      }
>    rephase ();
> +#pragma GCC novector
>    for (i = 0; i < 32; ++i)
> +#pragma GCC novector
>      for (j = 0; j < 3; ++j)
>  #pragma GCC novector
>        for (k = 0; k < 3; ++k)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c 
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> index 
> 423ff0b566b18bf04ce4f67a45b94dc1a021a4a0..5464d1d56fe97542a2dfc7afba39aabc0468737c
>  100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> @@ -5,7 +5,8 @@
>  /* { dg-additional-options "-O3" } */
>  /* { dg-additional-options "-mavx2" { target { x86_64-*-* i?86-*-* } } } */
>  
> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* Arm and -m32 create a group size of 3 here, which we can't support yet.  
> */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { { ! 
> arm*-*-* } || { { x86_64-*-* i?86-*-* } && ilp64 } } } } } */
>  
>  typedef struct filter_list_entry {
>    const char *name;
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c 
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..9bf0cbc8853f74de550e8ac83ab569fc9fbde126
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-O3" } */
> +
> +int a, b, c, d, e, f;
> +short g[1];
> +int main() {
> +  int h;
> +  while (a) {
> +    while (h)
> +      ;
> +    for (b = 2; b; b--) {
> +      while (c)
> +        ;
> +      f = g[a];
> +      if (d)
> +        break;
> +    }
> +    while (e)
> +      ;
> +  }
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c 
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..dc771186efafe25bb65490da7a383ad7f6ceb0a7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-O3" } */
> +
> +char string[1020];
> +
> +char * find(int n, char c)
> +{
> +    for (int i = 1; i < n; i++) {
> +        if (string[i] == c)
> +            return &string[i];
> +    }
> +    return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" 
> "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c 
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..82d473a279ce060c550289c61729d9f9b56f0d2a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
> @@ -0,0 +1,24 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* Alignment requirement too big, load lanes targets can't safely vectorize 
> this.  */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! 
> vect_load_lanes } } } } */
> +/* { dg-final { scan-tree-dump-not "Alignment of access forced using 
> peeling" "vect" { target { ! vect_load_lanes } } } } */
> +
> +unsigned test4(char x, char *restrict vect_a, char *restrict vect_b, int n)
> +{  
> + unsigned ret = 0;
> + for (int i = 0; i < (n - 2); i+=2)
> + {
> +   if (vect_a[i] > x || vect_a[i+2] > x)
> +     return 1;
> +
> +   vect_b[i] = x;
> +   vect_b[i+1] = x+1;
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c 
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..7d56772fbf380ce42ac758ca29a5f3f9d3f6e0d1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-O3" } */
> +
> +char string[1020];
> +
> +char * find(int n, char c)
> +{
> +    for (int i = 0; i < n; i++) {
> +        if (string[i] == c)
> +            return &string[i];
> +    }
> +    return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "Alignment of access forced using 
> peeling" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c 
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..374a051b945e97eedb9be9da423cf54b5e564d6f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-O3" } */
> +
> +char string[1020] __attribute__((aligned(1)));
> +
> +char * find(int n, char c)
> +{
> +    for (int i = 1; i < n; i++) {
> +        if (string[i] == c)
> +            return &string[i];
> +    }
> +    return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" 
> "vect" } } */
> +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c 
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..297fb7e9b9beffa25ab8f257ceea1c065fcc6ae9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-O3" } */
> +
> +char string[1020] __attribute__((aligned(1)));
> +
> +char * find(int n, char c)
> +{
> +    for (int i = 0; i < n; i++) {
> +        if (string[i] == c)
> +            return &string[i];
> +    }
> +    return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "Alignment of access forced using 
> peeling" "vect" } } */
> +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c 
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..ca95be44e92e32769da1d1e9b740ae54682a3d55
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
> @@ -0,0 +1,23 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +unsigned test4(char x, char *vect, int n)
> +{  
> + unsigned ret = 0;
> + for (int i = 0; i < n; i++)
> + {
> +   if (vect[i] > x)
> +     return 1;
> +
> +   vect[i] = x;
> + }
> + return ret;
> +}
> +
> +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" 
> "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c 
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..ee123df6ed2ba97e92307c64a61c97b1b6268743
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> @@ -0,0 +1,23 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +unsigned test4(char x, char *vect_a, char *vect_b, int n)
> +{  
> + unsigned ret = 0;
> + for (int i = 1; i < n; i++)
> + {
> +   if (vect_a[i] > x || vect_b[i] > x)
> +     return 1;
> +
> +   vect_a[i] = x;
> + }
> + return ret;
> +}
> +
> +/* { dg-final { scan-tree-dump "Versioning for alignment will be applied" 
> "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c 
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..51bad4e745b67cfdaad20f50776299531824ce9c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
> @@ -0,0 +1,23 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* This should be vectorizable through load_lanes and linear targets.  */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +unsigned test4(char x, char * restrict vect_a, char * restrict vect_b, int n)
> +{  
> + unsigned ret = 0;
> + for (int i = 0; i < n; i+=2)
> + {
> +   if (vect_a[i] > x || vect_a[i+1] > x)
> +     return 1;
> +
> +   vect_b[i] = x;
> +   vect_b[i+1] = x+1;
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c 
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..dbb14ba3239c91b9bfdf56cecc60750394e10f2b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
> @@ -0,0 +1,25 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +char vect_a[1025];
> +char vect_b[1025];
> +
> +unsigned test4(char x, int n)
> +{  
> + unsigned ret = 0;
> + for (int i = 1; i < (n - 2); i+=2)
> + {
> +   if (vect_a[i] > x || vect_a[i+1] > x)
> +     return 1;
> +
> +   vect_b[i] = x;
> +   vect_b[i+1] = x+1;
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c 
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..31e2096209253539483253efc17499a53d112894
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
> @@ -0,0 +1,28 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* Group size is uneven, load lanes targets can't safely vectorize this.  */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "Alignment of access forced using 
> peeling" "vect" } } */
> +
> +
> +char vect_a[1025];
> +char vect_b[1025];
> +
> +unsigned test4(char x, int n)
> +{  
> + unsigned ret = 0;
> + for (int i = 1; i < (n - 2); i+=2)
> + {
> +   if (vect_a[i-1] > x || vect_a[i+2] > x)
> +     return 1;
> +
> +   vect_b[i] = x;
> +   vect_b[i+1] = x+1;
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c 
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
> index 
> 9d3c6a5dffe3be4a7759b150e330d18144ab5ce5..b3f40b8c9ba49e41bd283e46a462238c3b5825ef
>  100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
> @@ -23,4 +23,5 @@ unsigned test4(unsigned x, unsigned n)
>   return ret;
>  }
>  
> -/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> +/* cannot safely vectorize this due due to the group misalignment.  */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 
> "vect" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr90178.c 
> b/gcc/testsuite/gcc.target/i386/pr90178.c
> index 
> 1df36af0541c01f3624fe51efbc8cfa0ec67fe60..e9fea04fb148ed53c1ac9b2c6ed73e85ba982b42
>  100644
> --- a/gcc/testsuite/gcc.target/i386/pr90178.c
> +++ b/gcc/testsuite/gcc.target/i386/pr90178.c
> @@ -4,6 +4,7 @@
>  int*
>  find_ptr (int* mem, int sz, int val)
>  {
> +#pragma GCC novector
>    for (int i = 0; i < sz; i++)
>      if (mem[i] == val) 
>        return &mem[i];
> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> index 
> 6d5854ac7c7a18e09ec7ad72c534abdc55cb6efa..c85df96685f64f9814251f2d4fdbcc5973f2b513
>  100644
> --- a/gcc/tree-vect-data-refs.cc
> +++ b/gcc/tree-vect-data-refs.cc
> @@ -731,7 +731,8 @@ vect_analyze_early_break_dependences (loop_vec_info 
> loop_vinfo)
>         if (is_gimple_debug (stmt))
>           continue;
>  
> -       stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
> +       stmt_vec_info stmt_vinfo
> +         = vect_stmt_to_vectorize (loop_vinfo->lookup_stmt (stmt));
>         auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
>         if (!dr_ref)
>           continue;
> @@ -748,26 +749,14 @@ vect_analyze_early_break_dependences (loop_vec_info 
> loop_vinfo)
>            bounded by VF so accesses are within range.  We only need to check
>            the reads since writes are moved to a safe place where if we get
>            there we know they are safe to perform.  */
> -       if (DR_IS_READ (dr_ref)
> -           && !ref_within_array_bound (stmt, DR_REF (dr_ref)))
> +       if (DR_IS_READ (dr_ref))
>           {
> -           if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo)
> -               || STMT_VINFO_STRIDED_P (stmt_vinfo))
> -             {
> -               const char *msg
> -                 = "early break not supported: cannot peel "
> -                   "for alignment, vectorization would read out of "
> -                   "bounds at %G";
> -               return opt_result::failure_at (stmt, msg, stmt);
> -             }
> -
> -           dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_vinfo);
> -           dr_info->need_peeling_for_alignment = true;
> +           dr_set_peeling_alignment (stmt_vinfo, true);
>  
>             if (dump_enabled_p ())
>               dump_printf_loc (MSG_NOTE, vect_location,
> -                              "marking DR (read) as needing peeling for "
> -                              "alignment at %G", stmt);
> +                              "marking DR (read) as possibly needing peeling 
> "
> +                              "for alignment at %G", stmt);
>           }
>  
>         if (DR_IS_READ (dr_ref))
> @@ -1326,9 +1315,6 @@ vect_record_base_alignments (vec_info *vinfo)
>     Compute the misalignment of the data reference DR_INFO when vectorizing
>     with VECTYPE.
>  
> -   RESULT is non-NULL iff VINFO is a loop_vec_info.  In that case, *RESULT 
> will
> -   be set appropriately on failure (but is otherwise left unchanged).
> -
>     Output:
>     1. initialized misalignment info for DR_INFO
>  
> @@ -1337,7 +1323,7 @@ vect_record_base_alignments (vec_info *vinfo)
>  
>  static void
>  vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info,
> -                              tree vectype, opt_result *result = nullptr)
> +                              tree vectype)
>  {
>    stmt_vec_info stmt_info = dr_info->stmt;
>    vec_base_alignments *base_alignments = &vinfo->base_alignments;
> @@ -1365,63 +1351,20 @@ vect_compute_data_ref_alignment (vec_info *vinfo, 
> dr_vec_info *dr_info,
>      = exact_div (targetm.vectorize.preferred_vector_alignment (vectype),
>                BITS_PER_UNIT);
>  
> -  /* If this DR needs peeling for alignment for correctness, we must
> -     ensure the target alignment is a constant power-of-two multiple of the
> -     amount read per vector iteration (overriding the above hook where
> -     necessary).  */
> -  if (dr_info->need_peeling_for_alignment)
> +  /* If we have a grouped access we require that the alignment be VF * elem. 
>  */
> +  if (loop_vinfo
> +      && dr_peeling_alignment (stmt_info)
> +      && STMT_VINFO_GROUPED_ACCESS (stmt_info))
>      {
> -      /* Vector size in bytes.  */
> -      poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT 
> (vectype));
> -
> -      /* We can only peel for loops, of course.  */
> -      gcc_checking_assert (loop_vinfo);
> -
> -      /* Calculate the number of vectors read per vector iteration.  If
> -      it is a power of two, multiply through to get the required
> -      alignment in bytes.  Otherwise, fail analysis since alignment
> -      peeling wouldn't work in such a case.  */
> -      poly_uint64 num_scalars = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> -      if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> -     num_scalars *= DR_GROUP_SIZE (stmt_info);
> -
> -      auto num_vectors = vect_get_num_vectors (num_scalars, vectype);
> -      if (!pow2p_hwi (num_vectors))
> -     {
> -       *result = opt_result::failure_at (vect_location,
> -                                         "non-power-of-two num vectors %u "
> -                                         "for DR needing peeling for "
> -                                         "alignment at %G",
> -                                         num_vectors, stmt_info->stmt);
> -       return;
> -     }
> -
> -      safe_align *= num_vectors;
> -      if (maybe_gt (safe_align, 4096U))
> -     {
> -       pretty_printer pp;
> -       pp_wide_integer (&pp, safe_align);
> -       *result = opt_result::failure_at (vect_location,
> -                                         "alignment required for correctness"
> -                                         " (%s) may exceed page size",
> -                                         pp_formatted_text (&pp));
> -       return;
> -     }
> -
> -      unsigned HOST_WIDE_INT multiple;
> -      if (!constant_multiple_p (vector_alignment, safe_align, &multiple)
> -       || !pow2p_hwi (multiple))
> +      poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> +      vector_alignment
> +     = vf * TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype)));

I think we discussed this before, also when introducing peeling
for alignment support.  This is incorrect for grouped accesses where
the number of scalar elements accessed is GROUP_SIZE * vf, so you
miss a multiplication by GROUP_SIZE here.

Note that this (and also your VF * element_size) can result in a
non-power-of-two value.

That said, I'm quite sure we don't want to have a dr->target_alignment
that isn't power-of-two, so if the comput doesn't end up with a
power-of-two value we should leave it as the target prefers and
fixup (or fail) during vectorizable_load.

> +      if (dump_enabled_p ())
>       {
> -       if (dump_enabled_p ())
> -         {
> -           dump_printf_loc (MSG_NOTE, vect_location,
> -                            "forcing alignment for DR from preferred (");
> -           dump_dec (MSG_NOTE, vector_alignment);
> -           dump_printf (MSG_NOTE, ") to safe align (");
> -           dump_dec (MSG_NOTE, safe_align);
> -           dump_printf (MSG_NOTE, ") for stmt: %G", stmt_info->stmt);
> -         }
> -       vector_alignment = safe_align;
> +       dump_printf_loc (MSG_NOTE, vect_location,
> +                        "alignment increased due to early break to ");
> +       dump_dec (MSG_NOTE, vector_alignment);
> +       dump_printf (MSG_NOTE, " bytes.\n");
>       }
>      }
>  
> @@ -2487,6 +2430,8 @@ vect_enhance_data_refs_alignment (loop_vec_info 
> loop_vinfo)
>        || !slpeel_can_duplicate_loop_p (loop, LOOP_VINFO_IV_EXIT (loop_vinfo),
>                                      loop_preheader_edge (loop))
>        || loop->inner
> +      /* We don't currently maintaing the LCSSA for prologue peeled inversed
> +      loops.  */
>        || LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
>      do_peeling = false;
>  
> @@ -2950,12 +2895,9 @@ vect_analyze_data_refs_alignment (loop_vec_info 
> loop_vinfo)
>         if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt)
>             && DR_GROUP_FIRST_ELEMENT (dr_info->stmt) != dr_info->stmt)
>           continue;
> -       opt_result res = opt_result::success ();
> +
>         vect_compute_data_ref_alignment (loop_vinfo, dr_info,
> -                                        STMT_VINFO_VECTYPE (dr_info->stmt),
> -                                        &res);
> -       if (!res)
> -         return res;
> +                                        STMT_VINFO_VECTYPE (dr_info->stmt));
>       }
>      }
>  
> @@ -7219,7 +7161,7 @@ vect_supportable_dr_alignment (vec_info *vinfo, 
> dr_vec_info *dr_info,
>  
>    if (misalignment == 0)
>      return dr_aligned;
> -  else if (dr_info->need_peeling_for_alignment)
> +  else if (dr_peeling_alignment (stmt_info))
>      return dr_unaligned_unsupported;
>  
>    /* For now assume all conditional loads/stores support unaligned
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 
> 6bbb16beff2c627fca11a7403ba5ee3a5faa21c1..436d373ae6ec06aff165a7bee37b3fa1dc95079b
>  100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -2597,6 +2597,89 @@ get_load_store_type (vec_info  *vinfo, stmt_vec_info 
> stmt_info,
>        return false;
>      }
>  
> +  /* If this DR needs peeling for alignment for correctness, we must
> +     ensure the target alignment is a constant power-of-two multiple of the
> +     amount read per vector iteration (overriding the above hook where
> +     necessary).  */
> +  if (dr_peeling_alignment (stmt_info))
> +    {
> +      /* We can only peel for loops, of course.  */
> +      gcc_checking_assert (loop_vinfo);
> +
> +      /* Check if we support the operation if early breaks are needed.  */
> +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> +       && (*memory_access_type == VMAT_GATHER_SCATTER
> +           || *memory_access_type == VMAT_STRIDED_SLP))
> +     {
> +       if (dump_enabled_p ())
> +         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                          "early break not supported: cannot peel for "
> +                          "alignment. With non-contiguous memory 
> vectorization"
> +                          " could read out of bounds at %G ",
> +                          STMT_VINFO_STMT (stmt_info));
> +       return false;
> +     }
> +
> +      /* Even if uneven group sizes are aligned on the first load, the second
> +      iteration won't be.  As such reject uneven group sizes.  */
> +      if (STMT_VINFO_GROUPED_ACCESS (stmt_info)
> +       && (DR_GROUP_SIZE (stmt_info) % 2) == 1)

Hmm, but a group size of 6 is even, but a vector size of four doesn't
make the 2nd aligned.  So we need a power-of-two GROUP_SIZE * VF
and a byte alignment according to that.

> +     {
> +       if (dump_enabled_p ())
> +         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                          "early break not supported: uneven group size, "
> +                          "vectorization could read out of bounds at %G ",
> +                          STMT_VINFO_STMT (stmt_info));
> +       return false;
> +     }
> +
> +      /* Vector size in bytes.  */
> +      poly_uint64 safe_align;
> +      if (nunits.is_constant ())
> +     safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT (vectype));
> +      else
> +     safe_align = estimated_poly_value (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
> +                                        POLY_VALUE_MAX);
> +
> +      auto num_vectors = ncopies;
> +      if (!pow2p_hwi (num_vectors))
> +     {
> +       if (dump_enabled_p ())
> +         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                          "non-power-of-two num vectors %u "
> +                          "for DR needing peeling for "
> +                          "alignment at %G",
> +                          num_vectors, STMT_VINFO_STMT (stmt_info));
> +       return false;
> +     }
> +
> +      safe_align *= num_vectors;
> +      bool inbounds
> +     = ref_within_array_bound (STMT_VINFO_STMT (stmt_info),
> +                               DR_REF (STMT_VINFO_DATA_REF (stmt_info)));

I'm again confused why you think ref_within_array_bound can be used to
validize anything?

> +      /* For VLA we have to insert a runtime check that the vector loads
> +      per iterations don't exceed a page size.  For now we can use
> +      POLY_VALUE_MAX as a proxy as we can't peel for VLA.  */
> +      if (maybe_gt (safe_align, (unsigned)param_min_pagesize)
> +       /* We don't support PFA for VLA at the moment.  Some targets like SVE
> +          return a target alignment requirement of a single element.  For
> +          early break this is potentially unsafe so we can't count on
> +          alignment rejecting such loops later as it thinks loads are never
> +          misaligned.  */
> +       || (!nunits.is_constant () && !inbounds))
> +     {
> +       if (dump_enabled_p ())
> +         {
> +           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                            "alignment required for correctness (");
> +           dump_dec (MSG_MISSED_OPTIMIZATION, safe_align);
> +           dump_printf (MSG_NOTE, ") may exceed page size\n");
> +         }
> +       return false;
> +     }
> +      *alignment_support_scheme = dr_unaligned_supported;

and the only thing should be *alignment_support_scheme == dr_aligned,
and with a possibly too low taget_alignment even that's not enough.

Can you split out the testsuite part that just adds #pragma GCC novector?
That part is OK.

Thanks,
Richard.

> +    }
> +
>    if (*alignment_support_scheme == dr_unaligned_unsupported)
>      {
>        if (dump_enabled_p ())
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 
> b0cb081cba0ae8b11fbfcfcb8c6d440ec451ccb5..aeaf714c155bc2d87bf50e6dba0dbfbcca027441
>  100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -1998,6 +1998,33 @@ dr_target_alignment (dr_vec_info *dr_info)
>  }
>  #define DR_TARGET_ALIGNMENT(DR) dr_target_alignment (DR)
>  
> +/* Return if the stmt_vec_info requires peeling for alignment.  */
> +inline bool
> +dr_peeling_alignment (stmt_vec_info stmt_info)
> +{
> +  dr_vec_info *dr_info;
> +  if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> +    dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (stmt_info));
> +  else
> +    dr_info = STMT_VINFO_DR_INFO (stmt_info);
> +
> +  return dr_info->need_peeling_for_alignment;
> +}
> +
> +/* Set the need_peeling_for_alignment for the the stmt_vec_info, if group
> +   access then set on the fist element otherwise set on DR directly.  */
> +inline void
> +dr_set_peeling_alignment (stmt_vec_info stmt_info, bool requires_alignment)
> +{
> +  dr_vec_info *dr_info;
> +  if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> +    dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (stmt_info));
> +  else
> +    dr_info = STMT_VINFO_DR_INFO (stmt_info);
> +
> +  dr_info->need_peeling_for_alignment = requires_alignment;
> +}
> +
>  inline void
>  set_dr_target_alignment (dr_vec_info *dr_info, poly_uint64 val)
>  {
> 
> 
> 
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to