Richard Biener <rguent...@suse.de> writes:
> On Tue, 4 Mar 2025, Tamar Christina wrote:
>
>> Hi All,
>> 
>> This fixes two PRs on Early break vectorization by delaying the safety 
>> checks to
>> vectorizable_load when the VF, VMAT and vectype are all known.
>> 
>> This patch does add two new restrictions:
>> 
>> 1. On LOAD_LANES targets, where the buffer size is known, we reject uneven
>>    group sizes, as they are unaligned every n % 2 iterations and so may cross
>>    a page unwittingly.

Sorry for the drive-by comment, but: it might be worth updating the
commit message to say non-power-of-2, rather than uneven.  The patch
uses the right check, but the message made it sound like it didn't.

Thanks,
Richard

>> 2. On LOAD_LANES targets when the buffer is unknown, we reject vectorization 
>> if
>>    we cannot peel for alignment, as the alignment requirement is quite large 
>> at
>>    GROUP_SIZE * vectype_size.  This is unlikely to ever be beneficial so we
>>    don't support it for now.
>> 
>> There are other steps documented inside the code itself so that the reasoning
>> is next to the code.
>> 
>> As a fall-back, when the alignment fails we require partial vector support.
>> 
>> For VLA targets like SVE return element alignment as the desired vector
>> alignment.  This means that the loads are never misaligned and so annoying it
>> won't ever need to peel.
>> 
>> So what I think needs to happen in GCC 16 is that.
>> 
>> 1. during vect_compute_data_ref_alignment we need to take the max of
>>    POLY_VALUE_MIN and vector_alignment.
>> 
>> 2. vect_do_peeling define skip_vector when PFA for VLA, and in the guard add 
>> a
>>    check that ncopies * vectype does not exceed POLY_VALUE_MAX which we use 
>> as a
>>    proxy for pagesize.
>> 
>> 3. Force LOOP_VINFO_USING_PARTIAL_VECTORS_P to be true in
>>    vect_determine_partial_vectors_and_peeling since the first iteration has 
>> to
>>    be partial. If LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P we have to fail to
>>    vectorize.
>> 
>> 4. Create a default mask to be used, so that 
>> vect_use_loop_mask_for_alignment_p
>>    becomes true and we generate the peeled check through loop control for
>>    partial loops.  From what I can tell this won't work for
>>    LOOP_VINFO_FULLY_WITH_LENGTH_P since they don't have any peeling support 
>> at
>>    all in the compiler.  That would need to be done independently from the
>>    above.
>> 
>> In any case, not GCC 15 material so I've kept the WIP patches I have 
>> downstream.
>> 
>> Bootstrapped Regtested on aarch64-none-linux-gnu,
>> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
>> -m32, -m64 and no issues.
>> 
>> Ok for master?
>
> OK.
>
> Thanks,
> Richard.
>
>> Thanks,
>> Tamar
>> 
>> gcc/ChangeLog:
>> 
>>      PR tree-optimization/118464
>>      PR tree-optimization/116855
>>      * doc/invoke.texi (min-pagesize): Update docs with vectorizer use.
>>      * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay
>>      checks.
>>      (vect_compute_data_ref_alignment): Remove alignment checks and move to
>>      get_load_store_type, increase group access alignment.
>>      (vect_enhance_data_refs_alignment): Add note to comment needing
>>      investigating.
>>      (vect_analyze_data_refs_alignment): Likewise.
>>      (vect_supportable_dr_alignment): For group loads look at first DR.
>>      * tree-vect-stmts.cc (get_load_store_type):
>>      Perform safety checks for early break pfa.
>>      * tree-vectorizer.h (dr_set_safe_speculative_read_required,
>>      dr_safe_speculative_read_required, DR_SCALAR_KNOWN_BOUNDS): New.
>>      (need_peeling_for_alignment): Renamed to...
>>      (safe_speculative_read_required): .. This
>>      (class dr_vec_info): Add scalar_access_known_in_bounds.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>      PR tree-optimization/118464
>>      PR tree-optimization/116855
>>      * gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the
>>      load type is relaxed later.
>>      * gcc.dg/vect/vect-early-break_121-pr114081.c: Update.
>>      * gcc.dg/vect/vect-early-break_22.c: Require partial vectors.
>>      * gcc.dg/vect/vect-early-break_128.c: Likewise.
>>      * gcc.dg/vect/vect-early-break_26.c: Likewise.
>>      * gcc.dg/vect/vect-early-break_43.c: Likewise.
>>      * gcc.dg/vect/vect-early-break_44.c: Likewise.
>>      * gcc.dg/vect/vect-early-break_2.c: Require load_lanes.
>>      * gcc.dg/vect/vect-early-break_7.c: Likewise.
>>      * g++.dg/vect/vect-early-break_7-pr118464.cc: New test.
>>      * gcc.dg/vect/vect-early-break_132-pr118464.c: New test.
>>      * gcc.dg/vect/vect-early-break_133_pfa1.c: New test.
>>      * gcc.dg/vect/vect-early-break_133_pfa11.c: New test.
>>      * gcc.dg/vect/vect-early-break_133_pfa10.c: New test.
>>      * gcc.dg/vect/vect-early-break_133_pfa2.c: New test.
>>      * gcc.dg/vect/vect-early-break_133_pfa3.c: New test.
>>      * gcc.dg/vect/vect-early-break_133_pfa4.c: New test.
>>      * gcc.dg/vect/vect-early-break_133_pfa5.c: New test.
>>      * gcc.dg/vect/vect-early-break_133_pfa6.c: New test.
>>      * gcc.dg/vect/vect-early-break_133_pfa7.c: New test.
>>      * gcc.dg/vect/vect-early-break_133_pfa8.c: New test.
>>      * gcc.dg/vect/vect-early-break_133_pfa9.c: New test.
>>      * gcc.dg/vect/vect-early-break_39.c: Update testcase for misalignment.
>>      * gcc.dg/vect/vect-early-break_53.c: Likewise.
>>      * gcc.dg/vect/vect-early-break_56.c: Likewise.
>>      * gcc.dg/vect/vect-early-break_57.c: Likewise.
>>      * gcc.dg/vect/vect-early-break_81.c: Likewise.
>> 
>> ---
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index 
>> 6f8bf3923863dee9ed35b0497f1ef58a65726701..a4c62e50785362c93de31ac44f4fb5cbf4d1e1ee
>>  100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -17260,7 +17260,7 @@ Maximum number of relations the oracle will register 
>> in a basic block.
>>  Work bound when discovering transitive relations from existing relations.
>>  
>>  @item min-pagesize
>> -Minimum page size for warning purposes.
>> +Minimum page size for warning and early break vectorization purposes.
>>  
>>  @item openacc-kernels
>>  Specify mode of OpenACC `kernels' constructs handling.
>> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c 
>> b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
>> index 
>> 9ef1330b47c817e16baaafa44c2b15108b9dd3a9..4c8255895b976653228233d93c950629f3231554
>>  100644
>> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
>> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
>> @@ -55,7 +55,9 @@ int main()
>>        }
>>      }
>>    rephase ();
>> +#pragma GCC novector
>>    for (i = 0; i < 32; ++i)
>> +#pragma GCC novector
>>      for (j = 0; j < 3; ++j)
>>  #pragma GCC novector
>>        for (k = 0; k < 3; ++k)
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c 
>> b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
>> index 
>> 423ff0b566b18bf04ce4f67a45b94dc1a021a4a0..f99c57be0adc4d49035b8a75c72d4a5b04cc05c7
>>  100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
>> @@ -5,7 +5,8 @@
>>  /* { dg-additional-options "-O3" } */
>>  /* { dg-additional-options "-mavx2" { target { x86_64-*-* i?86-*-* } } } */
>>  
>> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
>> +/* Arm and -m32 create a group size of 3 here, which we can't support yet. 
>> AArch64 makes elementwise accesses here.  */
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { 
>> aarch64*-*-* } } } } */
>>  
>>  typedef struct filter_list_entry {
>>    const char *name;
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_128.c 
>> b/gcc/testsuite/gcc.dg/vect/vect-early-break_128.c
>> index 
>> 6d7fb920ec2de529a4aa1de2c4a04286989204fd..ed6baf2d451f3887076a1e9143035363128efe70
>>  100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_128.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_128.c
>> @@ -3,7 +3,8 @@
>>  /* { dg-require-effective-target vect_early_break } */
>>  /* { dg-require-effective-target vect_int } */
>>  
>> -/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */
>> +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { 
>> target vect_partial_vectors } } } */
>> +/* { dg-final { scan-tree-dump-not "vectorizing stmts using SLP" "vect" { 
>> target { ! vect_partial_vectors } } } } */
>>  /* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" } } */
>>  
>>  #ifndef N
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c 
>> b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
>> new file mode 100644
>> index 
>> 0000000000000000000000000000000000000000..9bf0cbc8853f74de550e8ac83ab569fc9fbde126
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
>> @@ -0,0 +1,25 @@
>> +/* { dg-do compile } */
>> +/* { dg-add-options vect_early_break } */
>> +/* { dg-require-effective-target vect_early_break } */
>> +/* { dg-require-effective-target vect_int } */
>> +/* { dg-additional-options "-O3" } */
>> +
>> +int a, b, c, d, e, f;
>> +short g[1];
>> +int main() {
>> +  int h;
>> +  while (a) {
>> +    while (h)
>> +      ;
>> +    for (b = 2; b; b--) {
>> +      while (c)
>> +        ;
>> +      f = g[a];
>> +      if (d)
>> +        break;
>> +    }
>> +    while (e)
>> +      ;
>> +  }
>> +  return 0;
>> +}
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c 
>> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
>> new file mode 100644
>> index 
>> 0000000000000000000000000000000000000000..dc771186efafe25bb65490da7a383ad7f6ceb0a7
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
>> @@ -0,0 +1,19 @@
>> +/* { dg-do compile } */
>> +/* { dg-add-options vect_early_break } */
>> +/* { dg-require-effective-target vect_early_break } */
>> +/* { dg-require-effective-target vect_int } */
>> +/* { dg-additional-options "-O3" } */
>> +
>> +char string[1020];
>> +
>> +char * find(int n, char c)
>> +{
>> +    for (int i = 1; i < n; i++) {
>> +        if (string[i] == c)
>> +            return &string[i];
>> +    }
>> +    return 0;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
>> +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" 
>> "vect" } } */
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c 
>> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
>> new file mode 100644
>> index 
>> 0000000000000000000000000000000000000000..dd05046982524f15662be8df517716b581b8a2d9
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
>> @@ -0,0 +1,25 @@
>> +/* { dg-add-options vect_early_break } */
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target vect_early_break } */
>> +/* { dg-require-effective-target vect_int } */
>> +
>> +/* { dg-additional-options "-Ofast" } */
>> +
>> +/* Alignment requirement too big, load lanes targets can't safely vectorize 
>> this.  */
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { 
>> vect_partial_vectors || vect_load_lanes } } } } */
>> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! { 
>> vect_partial_vectors || vect_load_lanes } } } } } */
>> +/* { dg-final { scan-tree-dump-not "Alignment of access forced using 
>> peeling" "vect" { target { ! { vect_partial_vectors || vect_load_lanes } } } 
>> } } */
>> +
>> +unsigned test4(char x, char *restrict vect_a, char *restrict vect_b, int n)
>> +{  
>> + unsigned ret = 0;
>> + for (int i = 0; i < (n - 2); i+=2)
>> + {
>> +   if (vect_a[i] > x || vect_a[i+2] > x)
>> +     return 1;
>> +
>> +   vect_b[i] = x;
>> +   vect_b[i+1] = x+1;
>> + }
>> + return ret;
>> +}
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa11.c 
>> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa11.c
>> new file mode 100644
>> index 
>> 0000000000000000000000000000000000000000..085dd9b81bb6943440f34d044cbd24ee2121657c
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa11.c
>> @@ -0,0 +1,26 @@
>> +/* { dg-add-options vect_early_break } */
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target vect_early_break } */
>> +/* { dg-require-effective-target vect_int } */
>> +
>> +/* Gathers and scatters are not save to speculate across early breaks.  */
>> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! 
>> vect_partial_vectors } } } } */
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target 
>> vect_partial_vectors } } } */
>> +
>> +#define N 1024
>> +int vect_a[N];
>> +int vect_b[N];
>> +  
>> +int test4(int x, int stride)
>> +{
>> + int ret = 0;
>> + for (int i = 0; i < (N / stride); i++)
>> + {
>> +   vect_b[i] += x + i;
>> +   if (vect_a[i*stride] == x)
>> +     return i;
>> +   vect_a[i] += x * vect_b[i];
>> +   
>> + }
>> + return ret;
>> +}
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c 
>> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
>> new file mode 100644
>> index 
>> 0000000000000000000000000000000000000000..7d56772fbf380ce42ac758ca29a5f3f9d3f6e0d1
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
>> @@ -0,0 +1,19 @@
>> +/* { dg-do compile } */
>> +/* { dg-add-options vect_early_break } */
>> +/* { dg-require-effective-target vect_early_break } */
>> +/* { dg-require-effective-target vect_int } */
>> +/* { dg-additional-options "-O3" } */
>> +
>> +char string[1020];
>> +
>> +char * find(int n, char c)
>> +{
>> +    for (int i = 0; i < n; i++) {
>> +        if (string[i] == c)
>> +            return &string[i];
>> +    }
>> +    return 0;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
>> +/* { dg-final { scan-tree-dump-not "Alignment of access forced using 
>> peeling" "vect" } } */
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c 
>> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
>> new file mode 100644
>> index 
>> 0000000000000000000000000000000000000000..374a051b945e97eedb9be9da423cf54b5e564d6f
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
>> @@ -0,0 +1,20 @@
>> +/* { dg-do compile } */
>> +/* { dg-add-options vect_early_break } */
>> +/* { dg-require-effective-target vect_early_break } */
>> +/* { dg-require-effective-target vect_int } */
>> +/* { dg-additional-options "-O3" } */
>> +
>> +char string[1020] __attribute__((aligned(1)));
>> +
>> +char * find(int n, char c)
>> +{
>> +    for (int i = 1; i < n; i++) {
>> +        if (string[i] == c)
>> +            return &string[i];
>> +    }
>> +    return 0;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
>> +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" 
>> "vect" } } */
>> +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c 
>> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
>> new file mode 100644
>> index 
>> 0000000000000000000000000000000000000000..297fb7e9b9beffa25ab8f257ceea1c065fcc6ae9
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
>> @@ -0,0 +1,20 @@
>> +/* { dg-do compile } */
>> +/* { dg-add-options vect_early_break } */
>> +/* { dg-require-effective-target vect_early_break } */
>> +/* { dg-require-effective-target vect_int } */
>> +/* { dg-additional-options "-O3" } */
>> +
>> +char string[1020] __attribute__((aligned(1)));
>> +
>> +char * find(int n, char c)
>> +{
>> +    for (int i = 0; i < n; i++) {
>> +        if (string[i] == c)
>> +            return &string[i];
>> +    }
>> +    return 0;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
>> +/* { dg-final { scan-tree-dump-not "Alignment of access forced using 
>> peeling" "vect" } } */
>> +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c 
>> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
>> new file mode 100644
>> index 
>> 0000000000000000000000000000000000000000..ca95be44e92e32769da1d1e9b740ae54682a3d55
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
>> @@ -0,0 +1,23 @@
>> +/* { dg-add-options vect_early_break } */
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target vect_early_break } */
>> +/* { dg-require-effective-target vect_int } */
>> +
>> +/* { dg-additional-options "-Ofast" } */
>> +
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
>> +
>> +unsigned test4(char x, char *vect, int n)
>> +{  
>> + unsigned ret = 0;
>> + for (int i = 0; i < n; i++)
>> + {
>> +   if (vect[i] > x)
>> +     return 1;
>> +
>> +   vect[i] = x;
>> + }
>> + return ret;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" 
>> "vect" } } */
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c 
>> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
>> new file mode 100644
>> index 
>> 0000000000000000000000000000000000000000..ee123df6ed2ba97e92307c64a61c97b1b6268743
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
>> @@ -0,0 +1,23 @@
>> +/* { dg-add-options vect_early_break } */
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target vect_early_break } */
>> +/* { dg-require-effective-target vect_int } */
>> +
>> +/* { dg-additional-options "-Ofast" } */
>> +
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
>> +
>> +unsigned test4(char x, char *vect_a, char *vect_b, int n)
>> +{  
>> + unsigned ret = 0;
>> + for (int i = 1; i < n; i++)
>> + {
>> +   if (vect_a[i] > x || vect_b[i] > x)
>> +     return 1;
>> +
>> +   vect_a[i] = x;
>> + }
>> + return ret;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump "Versioning for alignment will be applied" 
>> "vect" } } */
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c 
>> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
>> new file mode 100644
>> index 
>> 0000000000000000000000000000000000000000..cf76c7109edce15f860cdc27e10850ef5a31fc9a
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
>> @@ -0,0 +1,23 @@
>> +/* { dg-add-options vect_early_break } */
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target vect_early_break } */
>> +/* { dg-require-effective-target vect_int } */
>> +
>> +/* { dg-additional-options "-Ofast" } */
>> +
>> +/* This should be vectorizable through load_lanes and linear targets.  */
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target 
>> vect_load_lanes } } } */
>> +
>> +unsigned test4(char x, char * restrict vect_a, char * restrict vect_b, int 
>> n)
>> +{  
>> + unsigned ret = 0;
>> + for (int i = 0; i < n; i+=2)
>> + {
>> +   if (vect_a[i] > x || vect_a[i+1] > x)
>> +     return 1;
>> +
>> +   vect_b[i] = x;
>> +   vect_b[i+1] = x+1;
>> + }
>> + return ret;
>> +}
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c 
>> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
>> new file mode 100644
>> index 
>> 0000000000000000000000000000000000000000..25d3a62356baf127c89187b150810e4d31567c6f
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
>> @@ -0,0 +1,26 @@
>> +/* { dg-add-options vect_early_break } */
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target vect_early_break } */
>> +/* { dg-require-effective-target vect_int } */
>> +
>> +/* { dg-additional-options "-Ofast" } */
>> +
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target 
>> vect_partial_vectors } } } */
>> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! 
>> vect_partial_vectors } } } } */
>> +
>> +char vect_a[1025];
>> +char vect_b[1025];
>> +
>> +unsigned test4(char x, int n)
>> +{  
>> + unsigned ret = 0;
>> + for (int i = 1; i < (n - 2); i+=2)
>> + {
>> +   if (vect_a[i] > x || vect_a[i+1] > x)
>> +     return 1;
>> +
>> +   vect_b[i] = x;
>> +   vect_b[i+1] = x+1;
>> + }
>> + return ret;
>> +}
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c 
>> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
>> new file mode 100644
>> index 
>> 0000000000000000000000000000000000000000..10eb98b726acb32a0d1de4daf202724995bfa1a6
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
>> @@ -0,0 +1,29 @@
>> +/* { dg-add-options vect_early_break } */
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target vect_early_break } */
>> +/* { dg-require-effective-target vect_int } */
>> +
>> +/* { dg-additional-options "-Ofast" } */
>> +
>> +/* Group size is uneven and second group is misaligned.  Needs partial 
>> vectors.  */
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target 
>> vect_partial_vectors } } } */
>> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! 
>> vect_partial_vectors } } } } */
>> +/* { dg-final { scan-tree-dump-not "Alignment of access forced using 
>> peeling" "vect" } } */
>> +
>> +
>> +char vect_a[1025];
>> +char vect_b[1025];
>> +
>> +unsigned test4(char x, int n)
>> +{  
>> + unsigned ret = 0;
>> + for (int i = 1; i < (n - 2); i+=2)
>> + {
>> +   if (vect_a[i-1] > x || vect_a[i+2] > x)
>> +     return 1;
>> +
>> +   vect_b[i] = x;
>> +   vect_b[i+1] = x+1;
>> + }
>> + return ret;
>> +}
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c 
>> b/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c
>> index 
>> babc79c74c39b5beedd293f2138f0c46846543b0..edddb44bad66aa419d097f69ca850e5eaa66e014
>>  100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c
>> @@ -5,7 +5,8 @@
>>  
>>  /* { dg-additional-options "-Ofast" } */
>>  
>> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target 
>> vect_load_lanes } } } */
>> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! 
>> vect_load_lanes } } } } */
>>  
>>  #ifndef N
>>  #define N 803
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c 
>> b/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c
>> index 
>> dec0b492ab883de6e02944a95fd554a109a68a39..8f5ccc45ce06ed36627107e080d633e55e254fa0
>>  100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c
>> @@ -5,7 +5,9 @@
>>  
>>  /* { dg-additional-options "-Ofast" } */
>>  
>> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! 
>> "arm*-*-*" } } } } */
>> +/* Complex numbers read x and x+1, which on non-load lanes targets require 
>> partial loops.  */
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { { ! 
>> "arm*-*-*" } && vect_load_lanes } } } } */
>> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { { 
>> "arm*-*-*" } || { ! vect_load_lanes } } } } } */
>>  
>>  #include <complex.h>
>>  
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c 
>> b/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c
>> index 
>> 039aac7fd84cf6131e1ea401b87385a32b545e67..7ac1e76f0aca37aa04a767b6034000f09aaf98b8
>>  100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c
>> @@ -5,7 +5,7 @@
>>  
>>  /* { dg-additional-options "-Ofast" } */
>>  
>> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! ia32 } 
>> } } } */
>>  
>>  #include <stdbool.h>
>>  
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c 
>> b/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c
>> index 
>> f73f3c2eb86e804803a969dab983dc9e39eed66a..483ea5f243c825d6a6c4f5aa7f86c3f9eb8b2e10
>>  100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c
>> @@ -5,7 +5,7 @@
>>  
>>  /* { dg-additional-options "-Ofast" } */
>>  
>> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! ia32 } 
>> } } } */
>>  
>>  #include <stdbool.h>
>>  
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c 
>> b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
>> index 
>> b3f5984f682f30f79331d48a264c2cc4af3e2503..f8f84fab97ab586847000af8b89448b0885ef5fc
>>  100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
>> @@ -42,4 +42,6 @@ main ()
>>    return 0;
>>  }
>>  
>> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 
>> "vect" } } */
>> +/* This will fail because we cannot SLP the load groups yet.  */
>> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 
>> "vect" { target vect_partial_vectors } } } */
>> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 
>> "vect" { target { ! vect_partial_vectors } } } } */
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c 
>> b/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c
>> index 
>> 47d2a50218bd1b32fe43edcaaabb1079d0b26223..643016b2ccfea29ba36d65c8070f255cb8179481
>>  100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c
>> @@ -41,4 +41,6 @@ main ()
>>    return 0;
>>  }
>>  
>> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 
>> "vect" } } */
>> +/* This will fail because we cannot SLP the load groups yet.  */
>> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 
>> "vect" { target vect_partial_vectors } } } */
>> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 
>> "vect" { target { ! vect_partial_vectors } } } } */
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c 
>> b/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c
>> index 
>> 8062fbbf6422af6a2e42de9574e88d411a8fb917..36fc6a6eb60fae70f8f05a3d9435f5adce025847
>>  100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c
>> @@ -23,4 +23,5 @@ unsigned test4(unsigned x)
>>   return ret;
>>  }
>>  
>> -/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } 
>> */
>> \ No newline at end of file
>> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { 
>> target vect_load_lanes } } } */
>> +/* { dg-final { scan-tree-dump-not "vectorized 1 loops in function" "vect" 
>> { target { ! vect_load_lanes } } } } */
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c 
>> b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
>> index 
>> 9d3c6a5dffe3be4a7759b150e330d18144ab5ce5..b3f40b8c9ba49e41bd283e46a462238c3b5825ef
>>  100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
>> @@ -23,4 +23,5 @@ unsigned test4(unsigned x, unsigned n)
>>   return ret;
>>  }
>>  
>> -/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } 
>> */
>> +/* cannot safely vectorize this due due to the group misalignment.  */
>> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 
>> "vect" } } */
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c 
>> b/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c
>> index 
>> 7e9f635a0b5a8f6fb5da5d7cc6a426f343af4b56..0cfa2428cc61d5f4ea0784367acea6436736970f
>>  100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c
>> @@ -27,4 +27,6 @@ unsigned test4(unsigned x)
>>   return ret;
>>  }
>>  
>> -/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } 
>> */
>> \ No newline at end of file
>> +/* This will fail because we cannot SLP the load groups yet.  */
>> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { 
>> target vect_partial_vectors } } } */
>> +/* { dg-final { scan-tree-dump-not "vectorized 1 loops in function" "vect" 
>> { target { ! vect_partial_vectors } } } } */
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c 
>> b/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c
>> index 
>> 7e9f635a0b5a8f6fb5da5d7cc6a426f343af4b56..0cfa2428cc61d5f4ea0784367acea6436736970f
>>  100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c
>> @@ -27,4 +27,6 @@ unsigned test4(unsigned x)
>>   return ret;
>>  }
>>  
>> -/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } 
>> */
>> \ No newline at end of file
>> +/* This will fail because we cannot SLP the load groups yet.  */
>> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { 
>> target vect_partial_vectors } } } */
>> +/* { dg-final { scan-tree-dump-not "vectorized 1 loops in function" "vect" 
>> { target { ! vect_partial_vectors } } } } */
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c 
>> b/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c
>> index 
>> a02d5986ba3cfc117b19305c5e96711299996931..d4fd0d39a25a5659e3d9452b79f3e0fabba8b3c0
>>  100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c
>> @@ -2,6 +2,7 @@
>>  /* { dg-do compile } */
>>  /* { dg-require-effective-target vect_early_break } */
>>  /* { dg-require-effective-target vect_int } */
>> +/* { dg-require-effective-target vect_partial_vectors } */
>>  
>>  void abort ();
>>  int a[64], b[64];
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c 
>> b/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c
>> index 
>> 9096f66647c7b3cb430562d35f8ce076244f7c11..b35e737fa3b9137cd745c14f7ad915a3f81c38c4
>>  100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c
>> @@ -4,6 +4,7 @@
>>  /* { dg-require-effective-target vect_int } */
>>  /* { dg-add-options bind_pic_locally } */
>>  /* { dg-require-effective-target vect_early_break_hw } */
>> +/* { dg-require-effective-target vect_partial_vectors } */
>>  
>>  #include <stdarg.h>
>>  #include "tree-vect.h"
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c 
>> b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
>> index 
>> 319bd125c3156f13c300ff2b94d269bb9ec29e97..a4886654f152b2c0568286febea2b31cb7be8499
>>  100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
>> @@ -5,8 +5,9 @@
>>  
>>  /* { dg-additional-options "-Ofast" } */
>>  
>> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
>> -/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */
>> +/* Multiple loads of different alignments, we can't peel this. */
>> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
>> +/* { dg-final { scan-tree-dump-not "epilog loop required" "vect" } } */
>>  
>>  void abort ();
>>  
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c 
>> b/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c
>> index 
>> 7b870e9c60dcac6164d879dd70c1fc07ec0221fe..c7cce81f52c80d83bd2c1face8cbd13f93834531
>>  100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c
>> @@ -5,7 +5,9 @@
>>  
>>  /* { dg-additional-options "-Ofast" } */
>>  
>> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
>> +/* This will fail because we cannot SLP the load groups yet.  */
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target 
>> vect_partial_vectors } } } */
>> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! 
>> vect_partial_vectors } } } } */
>>  
>>  #define N 1024
>>  unsigned vect_a[N];
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c 
>> b/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c
>> index 
>> d218a0686719fee4c167684dcf26402851b53260..34d187483320b9cc215304b73e28d45d7031516e
>>  100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c
>> @@ -5,7 +5,10 @@
>>  
>>  /* { dg-additional-options "-Ofast" } */
>>  
>> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! 
>> "arm*-*-*" } } } } */
>> +/* Complex numbers read x and x+1, which on non-load lanes targets require 
>> partial loops.  */
>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { { ! 
>> "arm*-*-*" } && vect_load_lanes } } } } */
>> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { { 
>> "arm*-*-*" } || { ! vect_load_lanes } } } } } */
>> +
>>  
>>  #include <complex.h>
>>  
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c 
>> b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
>> index 
>> 8a8c076ba92ca6fef419cb23b457a23555c61c64..b58a4611d6b8d86f0247d9ea44ab4750473589a9
>>  100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
>> @@ -5,8 +5,9 @@
>>  
>>  /* { dg-additional-options "-Ofast" } */
>>  
>> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
>> -/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */
>> +/* Multiple loads with different misalignments.  Can't peel need partial 
>> loop support.  */
>> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
>> +/* { dg-final { scan-tree-dump-not "epilog loop required" "vect" } } */
>>  void abort ();
>>  
>>  unsigned short sa[32];
>> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
>> index 
>> 6d5854ac7c7a18e09ec7ad72c534abdc55cb6efa..9949fc3d98852399242a96095f4dae5ffe7613b3
>>  100644
>> --- a/gcc/tree-vect-data-refs.cc
>> +++ b/gcc/tree-vect-data-refs.cc
>> @@ -731,7 +731,9 @@ vect_analyze_early_break_dependences (loop_vec_info 
>> loop_vinfo)
>>        if (is_gimple_debug (stmt))
>>          continue;
>>  
>> -      stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
>> +      stmt_vec_info stmt_vinfo
>> +        = vect_stmt_to_vectorize (loop_vinfo->lookup_stmt (stmt));
>> +      stmt = STMT_VINFO_STMT (stmt_vinfo);
>>        auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
>>        if (!dr_ref)
>>          continue;
>> @@ -748,26 +750,16 @@ vect_analyze_early_break_dependences (loop_vec_info 
>> loop_vinfo)
>>           bounded by VF so accesses are within range.  We only need to check
>>           the reads since writes are moved to a safe place where if we get
>>           there we know they are safe to perform.  */
>> -      if (DR_IS_READ (dr_ref)
>> -          && !ref_within_array_bound (stmt, DR_REF (dr_ref)))
>> +      if (DR_IS_READ (dr_ref))
>>          {
>> -          if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo)
>> -              || STMT_VINFO_STRIDED_P (stmt_vinfo))
>> -            {
>> -              const char *msg
>> -                = "early break not supported: cannot peel "
>> -                  "for alignment, vectorization would read out of "
>> -                  "bounds at %G";
>> -              return opt_result::failure_at (stmt, msg, stmt);
>> -            }
>> -
>> -          dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_vinfo);
>> -          dr_info->need_peeling_for_alignment = true;
>> +          dr_set_safe_speculative_read_required (stmt_vinfo, true);
>> +          bool inbounds = ref_within_array_bound (stmt, DR_REF (dr_ref));
>> +          DR_SCALAR_KNOWN_BOUNDS (STMT_VINFO_DR_INFO (stmt_vinfo)) = 
>> inbounds;
>>  
>>            if (dump_enabled_p ())
>>              dump_printf_loc (MSG_NOTE, vect_location,
>> -                             "marking DR (read) as needing peeling for "
>> -                             "alignment at %G", stmt);
>> +                             "marking DR (read) as possibly needing peeling 
>> "
>> +                             "for alignment at %G", stmt);
>>          }
>>  
>>        if (DR_IS_READ (dr_ref))
>> @@ -1326,9 +1318,6 @@ vect_record_base_alignments (vec_info *vinfo)
>>     Compute the misalignment of the data reference DR_INFO when vectorizing
>>     with VECTYPE.
>>  
>> -   RESULT is non-NULL iff VINFO is a loop_vec_info.  In that case, *RESULT 
>> will
>> -   be set appropriately on failure (but is otherwise left unchanged).
>> -
>>     Output:
>>     1. initialized misalignment info for DR_INFO
>>  
>> @@ -1337,7 +1326,7 @@ vect_record_base_alignments (vec_info *vinfo)
>>  
>>  static void
>>  vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info,
>> -                             tree vectype, opt_result *result = nullptr)
>> +                             tree vectype)
>>  {
>>    stmt_vec_info stmt_info = dr_info->stmt;
>>    vec_base_alignments *base_alignments = &vinfo->base_alignments;
>> @@ -1365,63 +1354,29 @@ vect_compute_data_ref_alignment (vec_info *vinfo, 
>> dr_vec_info *dr_info,
>>      = exact_div (targetm.vectorize.preferred_vector_alignment (vectype),
>>               BITS_PER_UNIT);
>>  
>> -  /* If this DR needs peeling for alignment for correctness, we must
>> -     ensure the target alignment is a constant power-of-two multiple of the
>> -     amount read per vector iteration (overriding the above hook where
>> -     necessary).  */
>> -  if (dr_info->need_peeling_for_alignment)
>> +  if (loop_vinfo
>> +      && dr_safe_speculative_read_required (stmt_info))
>>      {
>> -      /* Vector size in bytes.  */
>> -      poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT 
>> (vectype));
>> -
>> -      /* We can only peel for loops, of course.  */
>> -      gcc_checking_assert (loop_vinfo);
>> -
>> -      /* Calculate the number of vectors read per vector iteration.  If
>> -     it is a power of two, multiply through to get the required
>> -     alignment in bytes.  Otherwise, fail analysis since alignment
>> -     peeling wouldn't work in such a case.  */
>> -      poly_uint64 num_scalars = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>> +      poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>> +      auto vectype_size
>> +    = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype)));
>> +      poly_uint64 new_alignment = vf * vectype_size;
>> +      /* If we have a grouped access we require that the alignment be N * 
>> elem.  */
>>        if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
>> -    num_scalars *= DR_GROUP_SIZE (stmt_info);
>> +    new_alignment *= DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (stmt_info));
>>  
>> -      auto num_vectors = vect_get_num_vectors (num_scalars, vectype);
>> -      if (!pow2p_hwi (num_vectors))
>> -    {
>> -      *result = opt_result::failure_at (vect_location,
>> -                                        "non-power-of-two num vectors %u "
>> -                                        "for DR needing peeling for "
>> -                                        "alignment at %G",
>> -                                        num_vectors, stmt_info->stmt);
>> -      return;
>> -    }
>> -
>> -      safe_align *= num_vectors;
>> -      if (maybe_gt (safe_align, 4096U))
>> -    {
>> -      pretty_printer pp;
>> -      pp_wide_integer (&pp, safe_align);
>> -      *result = opt_result::failure_at (vect_location,
>> -                                        "alignment required for correctness"
>> -                                        " (%s) may exceed page size",
>> -                                        pp_formatted_text (&pp));
>> -      return;
>> -    }
>> -
>> -      unsigned HOST_WIDE_INT multiple;
>> -      if (!constant_multiple_p (vector_alignment, safe_align, &multiple)
>> -      || !pow2p_hwi (multiple))
>> +      unsigned HOST_WIDE_INT target_alignment;
>> +      if (new_alignment.is_constant (&target_alignment)
>> +      && pow2p_hwi (target_alignment))
>>      {
>>        if (dump_enabled_p ())
>>          {
>>            dump_printf_loc (MSG_NOTE, vect_location,
>> -                           "forcing alignment for DR from preferred (");
>> -          dump_dec (MSG_NOTE, vector_alignment);
>> -          dump_printf (MSG_NOTE, ") to safe align (");
>> -          dump_dec (MSG_NOTE, safe_align);
>> -          dump_printf (MSG_NOTE, ") for stmt: %G", stmt_info->stmt);
>> +                           "alignment increased due to early break to ");
>> +          dump_dec (MSG_NOTE, new_alignment);
>> +          dump_printf (MSG_NOTE, " bytes.\n");
>>          }
>> -      vector_alignment = safe_align;
>> +      vector_alignment = target_alignment;
>>      }
>>      }
>>  
>> @@ -2487,6 +2442,8 @@ vect_enhance_data_refs_alignment (loop_vec_info 
>> loop_vinfo)
>>        || !slpeel_can_duplicate_loop_p (loop, LOOP_VINFO_IV_EXIT 
>> (loop_vinfo),
>>                                     loop_preheader_edge (loop))
>>        || loop->inner
>> +      /* We don't currently maintaing the LCSSA for prologue peeled inversed
>> +     loops.  */
>>        || LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
>>      do_peeling = false;
>>  
>> @@ -2950,12 +2907,9 @@ vect_analyze_data_refs_alignment (loop_vec_info 
>> loop_vinfo)
>>        if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt)
>>            && DR_GROUP_FIRST_ELEMENT (dr_info->stmt) != dr_info->stmt)
>>          continue;
>> -      opt_result res = opt_result::success ();
>> +
>>        vect_compute_data_ref_alignment (loop_vinfo, dr_info,
>> -                                       STMT_VINFO_VECTYPE (dr_info->stmt),
>> -                                       &res);
>> -      if (!res)
>> -        return res;
>> +                                       STMT_VINFO_VECTYPE (dr_info->stmt));
>>      }
>>      }
>>  
>> @@ -7219,7 +7173,7 @@ vect_supportable_dr_alignment (vec_info *vinfo, 
>> dr_vec_info *dr_info,
>>  
>>    if (misalignment == 0)
>>      return dr_aligned;
>> -  else if (dr_info->need_peeling_for_alignment)
>> +  else if (dr_safe_speculative_read_required (stmt_info))
>>      return dr_unaligned_unsupported;
>>  
>>    /* For now assume all conditional loads/stores support unaligned
>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
>> index 
>> 6bbb16beff2c627fca11a7403ba5ee3a5faa21c1..743631f944884a31505a95f7a188fd4e4ca3797d
>>  100644
>> --- a/gcc/tree-vect-stmts.cc
>> +++ b/gcc/tree-vect-stmts.cc
>> @@ -2597,6 +2597,128 @@ get_load_store_type (vec_info  *vinfo, stmt_vec_info 
>> stmt_info,
>>        return false;
>>      }
>>  
>> +
>> +  /* Checks if all scalar iterations are known to be inbounds.  */
>> +  bool inbounds = DR_SCALAR_KNOWN_BOUNDS (STMT_VINFO_DR_INFO (stmt_info));
>> +
>> +  /* Check if we support the operation if early breaks are needed.  Here we
>> +     must ensure that we don't access any more than the scalar code would
>> +     have.  A masked operation would ensure this, so for these load types
>> +     force masking.  */
>> +  if (loop_vinfo
>> +      && dr_safe_speculative_read_required (stmt_info)
>> +      && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
>> +      && (*memory_access_type == VMAT_GATHER_SCATTER
>> +      || *memory_access_type == VMAT_STRIDED_SLP))
>> +    {
>> +      if (dump_enabled_p ())
>> +      dump_printf_loc (MSG_NOTE, vect_location,
>> +                       "early break not supported: cannot peel for "
>> +                       "alignment. With non-contiguous memory vectorization"
>> +                       " could read out of bounds at %G ",
>> +                       STMT_VINFO_STMT (stmt_info));
>> +    if (inbounds)
>> +      LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = true;
>> +    else
>> +      return false;
>> +    }
>> +
>> +  /* If this DR needs alignment for correctness, we must ensure the target
>> +     alignment is a constant power-of-two multiple of the amount read per
>> +     vector iteration or force masking.  */
>> +  if (dr_safe_speculative_read_required (stmt_info)
>> +      && *alignment_support_scheme == dr_aligned)
>> +    {
>> +      /* We can only peel for loops, of course.  */
>> +      gcc_checking_assert (loop_vinfo);
>> +
>> +      auto target_alignment
>> +    = DR_TARGET_ALIGNMENT (STMT_VINFO_DR_INFO (stmt_info));
>> +      unsigned HOST_WIDE_INT target_align;
>> +
>> +      bool group_aligned = false;
>> +      if (target_alignment.is_constant (&target_align)
>> +      && nunits.is_constant ())
>> +    {
>> +      poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>> +      auto vectype_size
>> +        = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype)));
>> +      poly_uint64 required_alignment = vf * vectype_size;
>> +      /* If we have a grouped access we require that the alignment be N * 
>> elem.  */
>> +      if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
>> +        required_alignment *=
>> +            DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (stmt_info));
>> +      if (!multiple_p (target_alignment, required_alignment))
>> +        {
>> +          if (dump_enabled_p ())
>> +            dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> +                         "desired alignment %wu not met. Instead got %wu "
>> +                         "for DR alignment at %G",
>> +                         required_alignment.to_constant (),
>> +                         target_align, STMT_VINFO_STMT (stmt_info));
>> +          return false;
>> +        }
>> +
>> +      if (!pow2p_hwi (target_align))
>> +        {
>> +          if (dump_enabled_p ())
>> +            dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> +                         "non-power-of-two vector alignment %wd "
>> +                         "for DR alignment at %G",
>> +                         target_align, STMT_VINFO_STMT (stmt_info));
>> +          return false;
>> +        }
>> +
>> +      /* For VLA we have to insert a runtime check that the vector loads
>> +         per iterations don't exceed a page size.  For now we can use
>> +         POLY_VALUE_MAX as a proxy as we can't peel for VLA.  */
>> +      if (known_gt (required_alignment, (unsigned)param_min_pagesize))
>> +        {
>> +          if (dump_enabled_p ())
>> +            {
>> +              dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> +                           "alignment required for correctness (");
>> +              dump_dec (MSG_MISSED_OPTIMIZATION, required_alignment);
>> +              dump_printf (MSG_NOTE, ") may exceed page size\n");
>> +            }
>> +          return false;
>> +        }
>> +
>> +      group_aligned = true;
>> +    }
>> +
>> +      /* There are multiple loads that have a misalignment that we couldn't
>> +     align.  We would need LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P to
>> +     vectorize. */
>> +      if (!group_aligned)
>> +    {
>> +      if (inbounds)
>> +        LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = true;
>> +      else
>> +        return false;
>> +    }
>> +
>> +      /* When using a group access the first element may be aligned but the
>> +     subsequent loads may not be.  For LOAD_LANES since the loads are based
>> +     on the first DR then all loads in the group are aligned.  For
>> +     non-LOAD_LANES this is not the case. In particular a load + blend when
>> +     there are gaps can have the non first loads issued unaligned, even
>> +     partially overlapping the memory of the first load in order to simplify
>> +     the blend.  This is what the x86_64 backend does for instance.  As
>> +     such only the first load in the group is aligned, the rest are not.
>> +     Because of this the permutes may break the alignment requirements that
>> +     have been set, and as such we should for now, reject them.  */
>> +      if (SLP_TREE_LOAD_PERMUTATION (slp_node).exists ())
>> +    {
>> +      if (dump_enabled_p ())
>> +        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> +                         "loads with load permutations not supported for "
>> +                         "speculative early break loads for %G",
>> +                         STMT_VINFO_STMT (stmt_info));
>> +      return false;
>> +    }
>> +    }
>> +
>>    if (*alignment_support_scheme == dr_unaligned_unsupported)
>>      {
>>        if (dump_enabled_p ())
>> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
>> index 
>> b0cb081cba0ae8b11fbfcfcb8c6d440ec451ccb5..97caf61b345735d297ec49fd6ca64797435b46fc
>>  100644
>> --- a/gcc/tree-vectorizer.h
>> +++ b/gcc/tree-vectorizer.h
>> @@ -1281,7 +1281,11 @@ public:
>>  
>>    /* Set by early break vectorization when this DR needs peeling for 
>> alignment
>>       for correctness.  */
>> -  bool need_peeling_for_alignment;
>> +  bool safe_speculative_read_required;
>> +
>> +  /* Set by early break vectorization when this DR's scalar accesses are 
>> known
>> +     to be inbounds of a known bounds loop.  */
>> +  bool scalar_access_known_in_bounds;
>>  
>>    tree base_decl;
>>  
>> @@ -1997,6 +2001,35 @@ dr_target_alignment (dr_vec_info *dr_info)
>>    return dr_info->target_alignment;
>>  }
>>  #define DR_TARGET_ALIGNMENT(DR) dr_target_alignment (DR)
>> +#define DR_SCALAR_KNOWN_BOUNDS(DR) (DR)->scalar_access_known_in_bounds
>> +
>> +/* Return if the stmt_vec_info requires peeling for alignment.  */
>> +inline bool
>> +dr_safe_speculative_read_required (stmt_vec_info stmt_info)
>> +{
>> +  dr_vec_info *dr_info;
>> +  if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
>> +    dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (stmt_info));
>> +  else
>> +    dr_info = STMT_VINFO_DR_INFO (stmt_info);
>> +
>> +  return dr_info->safe_speculative_read_required;
>> +}
>> +
>> +/* Set the safe_speculative_read_required for the the stmt_vec_info, if 
>> group
>> +   access then set on the fist element otherwise set on DR directly.  */
>> +inline void
>> +dr_set_safe_speculative_read_required (stmt_vec_info stmt_info,
>> +                                   bool requires_alignment)
>> +{
>> +  dr_vec_info *dr_info;
>> +  if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
>> +    dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (stmt_info));
>> +  else
>> +    dr_info = STMT_VINFO_DR_INFO (stmt_info);
>> +
>> +  dr_info->safe_speculative_read_required = requires_alignment;
>> +}
>>  
>>  inline void
>>  set_dr_target_alignment (dr_vec_info *dr_info, poly_uint64 val)
>> 
>> 
>> 

Reply via email to