> -----Original Message----- > From: Richard Biener <rguent...@suse.de> > Sent: Wednesday, February 26, 2025 1:52 PM > To: Tamar Christina <tamar.christ...@arm.com> > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com> > Subject: RE: [3/3 PATCH v3]middle-end: delay checking for alignment to load > [PR118464] > > On Wed, 26 Feb 2025, Tamar Christina wrote: > > > > -----Original Message----- > > > From: Richard Biener <rguent...@suse.de> > > > Sent: Wednesday, February 26, 2025 12:30 PM > > > To: Tamar Christina <tamar.christ...@arm.com> > > > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com> > > > Subject: Re: [3/3 PATCH v3]middle-end: delay checking for alignment to > > > load > > > [PR118464] > > > > > > On Tue, 25 Feb 2025, Tamar Christina wrote: > > > > > > > Hi All, > > > > > > > > This fixes two PRs on Early break vectorization by delaying the safety > > > > checks > to > > > > vectorizable_load when the VF, VMAT and vectype are all known. > > > > > > > > This patch does add two new restrictions: > > > > > > > > 1. On LOAD_LANES targets, where the buffer size is known, we reject > > > > uneven > > > > group sizes, as they are unaligned every n % 2 iterations and so may > > > > cross > > > > a page unwittingly. > > > > > > > > 2. On LOAD_LANES targets when the buffer is unknown, we reject > vectorization > > > if > > > > we cannot peel for alignment, as the alignment requirement is quite > > > > large at > > > > GROUP_SIZE * vectype_size. This is unlikely to ever be beneficial > > > > so we > > > > don't support it for now. > > > > > > > > There are other steps documented inside the code itself so that the > > > > reasoning > > > > is next to the code. > > > > > > > > Note that for VLA I have still left this fully disabled when not > > > > working on a > > > > fixed buffer. > > > > > > > > For VLA targets like SVE return element alignment as the desired vector > > > > alignment. This means that the loads are never misaligned and so > > > > annoying it > > > > won't ever need to peel. > > > > > > > > So what I think needs to happen in GCC 16 is that. > > > > > > > > 1. during vect_compute_data_ref_alignment we need to take the max of > > > > POLY_VALUE_MIN and vector_alignment. > > > > > > > > 2. vect_do_peeling define skip_vector when PFA for VLA, and in the guard > add a > > > > check that ncopies * vectype does not exceed POLY_VALUE_MAX which we > use > > > as a > > > > proxy for pagesize. > > > > > > > > 3. Force LOOP_VINFO_USING_PARTIAL_VECTORS_P to be true in > > > > vect_determine_partial_vectors_and_peeling since the first iteration > > > > has to > > > > be partial. If LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P we have to fail > to > > > > vectorize. > > > > > > > > 4. Create a default mask to be used, so that > > > vect_use_loop_mask_for_alignment_p > > > > becomes true and we generate the peeled check through loop control > > > > for > > > > partial loops. From what I can tell this won't work for > > > > LOOP_VINFO_FULLY_WITH_LENGTH_P since they don't have any peeling > > > support at > > > > all in the compiler. That would need to be done independently from > > > > the > > > > above. > > > > > > > > In any case, not GCC 15 material so I've kept the WIP patches I have > > > downstream. > > > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, > > > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu > > > > -m32, -m64 and no issues. > > > > > > > > Ok for master? > > > > > > > > Thanks, > > > > Tamar > > > > > > > > gcc/ChangeLog: > > > > > > > > PR tree-optimization/118464 > > > > PR tree-optimization/116855 > > > > * doc/invoke.texi (min-pagesize): Update docs with vectorizer > > > > use. > > > > * tree-vect-data-refs.cc > > > > (vect_analyze_early_break_dependences): Delay > > > > checks. > > > > (vect_compute_data_ref_alignment): Remove alignment checks and > > > > move > > > to > > > > get_load_store_type, increase group access alignment. > > > > (vect_enhance_data_refs_alignment): Add note to comment needing > > > > investigating. > > > > (vect_analyze_data_refs_alignment): Likewise. > > > > (vect_supportable_dr_alignment): For group loads look at first > > > > DR. > > > > * tree-vect-stmts.cc (get_load_store_type): > > > > Perform safety checks for early break pfa. > > > > * tree-vectorizer.h (dr_set_safe_speculative_read_required, > > > > dr_safe_speculative_read_required, DR_SCALAR_KNOWN_BOUNDS): > > > New. > > > > (need_peeling_for_alignment): Renamed to... > > > > (safe_speculative_read_required): .. This > > > > (class dr_vec_info): Add scalar_access_known_in_bounds. > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > PR tree-optimization/118464 > > > > PR tree-optimization/116855 > > > > * gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes > > > > because the > > > > load type is relaxed later. > > > > * gcc.dg/vect/vect-early-break_121-pr114081.c: Update. > > > > * gcc.dg/vect/vect-early-break_22.c: Reject for load_lanes > > > > targets > > > > * g++.dg/vect/vect-early-break_7-pr118464.cc: New test. > > > > * gcc.dg/vect/vect-early-break_132-pr118464.c: New test. > > > > * gcc.dg/vect/vect-early-break_133_pfa1.c: New test. > > > > * gcc.dg/vect/vect-early-break_133_pfa10.c: New test. > > > > * gcc.dg/vect/vect-early-break_133_pfa2.c: New test. > > > > * gcc.dg/vect/vect-early-break_133_pfa3.c: New test. > > > > * gcc.dg/vect/vect-early-break_133_pfa4.c: New test. > > > > * gcc.dg/vect/vect-early-break_133_pfa5.c: New test. > > > > * gcc.dg/vect/vect-early-break_133_pfa6.c: New test. > > > > * gcc.dg/vect/vect-early-break_133_pfa7.c: New test. > > > > * gcc.dg/vect/vect-early-break_133_pfa8.c: New test. > > > > * gcc.dg/vect/vect-early-break_133_pfa9.c: New test. > > > > * gcc.dg/vect/vect-early-break_39.c: Update testcase for > > > > misalignment. > > > > * gcc.dg/vect/vect-early-break_53.c: Likewise. > > > > * gcc.dg/vect/vect-early-break_56.c: Likewise. > > > > * gcc.dg/vect/vect-early-break_57.c: Likewise. > > > > * gcc.dg/vect/vect-early-break_81.c: Likewise. > > > > > > > > --- > > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > > > > index > > > > ca8e468f3f2dbf68c959f74d8fec48c79463504d..fff2874b326d605fc2656adf4ab > > > 6eb5bd5d42d71 100644 > > > > --- a/gcc/doc/invoke.texi > > > > +++ b/gcc/doc/invoke.texi > > > > @@ -17256,7 +17256,7 @@ Maximum number of relations the oracle will > > > register in a basic block. > > > > Work bound when discovering transitive relations from existing > > > > relations. > > > > > > > > @item min-pagesize > > > > -Minimum page size for warning purposes. > > > > +Minimum page size for warning and early break vectorization purposes. > > > > > > > > @item openacc-kernels > > > > Specify mode of OpenACC `kernels' constructs handling. > > > > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c > > > b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c > > > > index > > > > 9ef1330b47c817e16baaafa44c2b15108b9dd3a9..4c8255895b976653228233d > > > 93c950629f3231554 100644 > > > > --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c > > > > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c > > > > @@ -55,7 +55,9 @@ int main() > > > > } > > > > } > > > > rephase (); > > > > +#pragma GCC novector > > > > for (i = 0; i < 32; ++i) > > > > +#pragma GCC novector > > > > for (j = 0; j < 3; ++j) > > > > #pragma GCC novector > > > > for (k = 0; k < 3; ++k) > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c > > > > index > > > > 423ff0b566b18bf04ce4f67a45b94dc1a021a4a0..f99c57be0adc4d49035b8a75c > > > 72d4a5b04cc05c7 100644 > > > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c > > > > @@ -5,7 +5,8 @@ > > > > /* { dg-additional-options "-O3" } */ > > > > /* { dg-additional-options "-mavx2" { target { x86_64-*-* i?86-*-* } } > > > > } */ > > > > > > > > -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > > > +/* Arm and -m32 create a group size of 3 here, which we can't support > > > > yet. > > > AArch64 makes elementwise accesses here. */ > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { > aarch64*-*- > > > * } } } } */ > > > > > > > > typedef struct filter_list_entry { > > > > const char *name; > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c > > > > new file mode 100644 > > > > index > > > > 0000000000000000000000000000000000000000..9bf0cbc8853f74de550e8a > > > c83ab569fc9fbde126 > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c > > > > @@ -0,0 +1,25 @@ > > > > +/* { dg-do compile } */ > > > > +/* { dg-add-options vect_early_break } */ > > > > +/* { dg-require-effective-target vect_early_break } */ > > > > +/* { dg-require-effective-target vect_int } */ > > > > +/* { dg-additional-options "-O3" } */ > > > > + > > > > +int a, b, c, d, e, f; > > > > +short g[1]; > > > > +int main() { > > > > + int h; > > > > + while (a) { > > > > + while (h) > > > > + ; > > > > + for (b = 2; b; b--) { > > > > + while (c) > > > > + ; > > > > + f = g[a]; > > > > + if (d) > > > > + break; > > > > + } > > > > + while (e) > > > > + ; > > > > + } > > > > + return 0; > > > > +} > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c > > > > new file mode 100644 > > > > index > > > > 0000000000000000000000000000000000000000..dc771186efafe25bb65490 > > > da7a383ad7f6ceb0a7 > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c > > > > @@ -0,0 +1,19 @@ > > > > +/* { dg-do compile } */ > > > > +/* { dg-add-options vect_early_break } */ > > > > +/* { dg-require-effective-target vect_early_break } */ > > > > +/* { dg-require-effective-target vect_int } */ > > > > +/* { dg-additional-options "-O3" } */ > > > > + > > > > +char string[1020]; > > > > + > > > > +char * find(int n, char c) > > > > +{ > > > > + for (int i = 1; i < n; i++) { > > > > + if (string[i] == c) > > > > + return &string[i]; > > > > + } > > > > + return 0; > > > > +} > > > > + > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > > > +/* { dg-final { scan-tree-dump "Alignment of access forced using > > > > peeling" > "vect" > > > } } */ > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c > > > > new file mode 100644 > > > > index > > > > 0000000000000000000000000000000000000000..82d473a279ce060c55028 > > > 9c61729d9f9b56f0d2a > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c > > > > @@ -0,0 +1,24 @@ > > > > +/* { dg-add-options vect_early_break } */ > > > > +/* { dg-do compile } */ > > > > +/* { dg-require-effective-target vect_early_break } */ > > > > +/* { dg-require-effective-target vect_int } */ > > > > + > > > > +/* { dg-additional-options "-Ofast" } */ > > > > + > > > > +/* Alignment requirement too big, load lanes targets can't safely > > > > vectorize > this. > > > */ > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! > > > vect_load_lanes } } } } */ > > > > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using > peeling" > > > "vect" { target { ! vect_load_lanes } } } } */ > > > > + > > > > +unsigned test4(char x, char *restrict vect_a, char *restrict vect_b, > > > > int n) > > > > +{ > > > > + unsigned ret = 0; > > > > + for (int i = 0; i < (n - 2); i+=2) > > > > + { > > > > + if (vect_a[i] > x || vect_a[i+2] > x) > > > > + return 1; > > > > + > > > > + vect_b[i] = x; > > > > + vect_b[i+1] = x+1; > > > > + } > > > > + return ret; > > > > +} > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c > > > > new file mode 100644 > > > > index > > > > 0000000000000000000000000000000000000000..7d56772fbf380ce42ac758 > > > ca29a5f3f9d3f6e0d1 > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c > > > > @@ -0,0 +1,19 @@ > > > > +/* { dg-do compile } */ > > > > +/* { dg-add-options vect_early_break } */ > > > > +/* { dg-require-effective-target vect_early_break } */ > > > > +/* { dg-require-effective-target vect_int } */ > > > > +/* { dg-additional-options "-O3" } */ > > > > + > > > > +char string[1020]; > > > > + > > > > +char * find(int n, char c) > > > > +{ > > > > + for (int i = 0; i < n; i++) { > > > > + if (string[i] == c) > > > > + return &string[i]; > > > > + } > > > > + return 0; > > > > +} > > > > + > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > > > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using > peeling" > > > "vect" } } */ > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c > > > > new file mode 100644 > > > > index > > > > 0000000000000000000000000000000000000000..374a051b945e97eedb9be > > > 9da423cf54b5e564d6f > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c > > > > @@ -0,0 +1,20 @@ > > > > +/* { dg-do compile } */ > > > > +/* { dg-add-options vect_early_break } */ > > > > +/* { dg-require-effective-target vect_early_break } */ > > > > +/* { dg-require-effective-target vect_int } */ > > > > +/* { dg-additional-options "-O3" } */ > > > > + > > > > +char string[1020] __attribute__((aligned(1))); > > > > + > > > > +char * find(int n, char c) > > > > +{ > > > > + for (int i = 1; i < n; i++) { > > > > + if (string[i] == c) > > > > + return &string[i]; > > > > + } > > > > + return 0; > > > > +} > > > > + > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > > > +/* { dg-final { scan-tree-dump "Alignment of access forced using > > > > peeling" > "vect" > > > } } */ > > > > +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } > > > > */ > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c > > > > new file mode 100644 > > > > index > > > > 0000000000000000000000000000000000000000..297fb7e9b9beffa25ab8f25 > > > 7ceea1c065fcc6ae9 > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c > > > > @@ -0,0 +1,20 @@ > > > > +/* { dg-do compile } */ > > > > +/* { dg-add-options vect_early_break } */ > > > > +/* { dg-require-effective-target vect_early_break } */ > > > > +/* { dg-require-effective-target vect_int } */ > > > > +/* { dg-additional-options "-O3" } */ > > > > + > > > > +char string[1020] __attribute__((aligned(1))); > > > > + > > > > +char * find(int n, char c) > > > > +{ > > > > + for (int i = 0; i < n; i++) { > > > > + if (string[i] == c) > > > > + return &string[i]; > > > > + } > > > > + return 0; > > > > +} > > > > + > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > > > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using > peeling" > > > "vect" } } */ > > > > +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } > > > > */ > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c > > > > new file mode 100644 > > > > index > > > > 0000000000000000000000000000000000000000..ca95be44e92e32769da1d > > > 1e9b740ae54682a3d55 > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c > > > > @@ -0,0 +1,23 @@ > > > > +/* { dg-add-options vect_early_break } */ > > > > +/* { dg-do compile } */ > > > > +/* { dg-require-effective-target vect_early_break } */ > > > > +/* { dg-require-effective-target vect_int } */ > > > > + > > > > +/* { dg-additional-options "-Ofast" } */ > > > > + > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > > > + > > > > +unsigned test4(char x, char *vect, int n) > > > > +{ > > > > + unsigned ret = 0; > > > > + for (int i = 0; i < n; i++) > > > > + { > > > > + if (vect[i] > x) > > > > + return 1; > > > > + > > > > + vect[i] = x; > > > > + } > > > > + return ret; > > > > +} > > > > + > > > > +/* { dg-final { scan-tree-dump "Alignment of access forced using > > > > peeling" > "vect" > > > } } */ > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c > > > > new file mode 100644 > > > > index > > > > 0000000000000000000000000000000000000000..ee123df6ed2ba97e92307c > > > 64a61c97b1b6268743 > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c > > > > @@ -0,0 +1,23 @@ > > > > +/* { dg-add-options vect_early_break } */ > > > > +/* { dg-do compile } */ > > > > +/* { dg-require-effective-target vect_early_break } */ > > > > +/* { dg-require-effective-target vect_int } */ > > > > + > > > > +/* { dg-additional-options "-Ofast" } */ > > > > + > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > > > + > > > > +unsigned test4(char x, char *vect_a, char *vect_b, int n) > > > > +{ > > > > + unsigned ret = 0; > > > > + for (int i = 1; i < n; i++) > > > > + { > > > > + if (vect_a[i] > x || vect_b[i] > x) > > > > + return 1; > > > > + > > > > + vect_a[i] = x; > > > > + } > > > > + return ret; > > > > +} > > > > + > > > > +/* { dg-final { scan-tree-dump "Versioning for alignment will be > > > > applied" > "vect" } > > > } */ > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c > > > > new file mode 100644 > > > > index > > > > 0000000000000000000000000000000000000000..51bad4e745b67cfdaad20f > > > 50776299531824ce9c > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c > > > > @@ -0,0 +1,23 @@ > > > > +/* { dg-add-options vect_early_break } */ > > > > +/* { dg-do compile } */ > > > > +/* { dg-require-effective-target vect_early_break } */ > > > > +/* { dg-require-effective-target vect_int } */ > > > > + > > > > +/* { dg-additional-options "-Ofast" } */ > > > > + > > > > +/* This should be vectorizable through load_lanes and linear targets. > > > > */ > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > > > + > > > > +unsigned test4(char x, char * restrict vect_a, char * restrict vect_b, > > > > int n) > > > > +{ > > > > + unsigned ret = 0; > > > > + for (int i = 0; i < n; i+=2) > > > > + { > > > > + if (vect_a[i] > x || vect_a[i+1] > x) > > > > + return 1; > > > > + > > > > + vect_b[i] = x; > > > > + vect_b[i+1] = x+1; > > > > + } > > > > + return ret; > > > > +} > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c > > > > new file mode 100644 > > > > index > > > > 0000000000000000000000000000000000000000..dbb14ba3239c91b9bfdf56 > > > cecc60750394e10f2b > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c > > > > @@ -0,0 +1,25 @@ > > > > +/* { dg-add-options vect_early_break } */ > > > > +/* { dg-do compile } */ > > > > +/* { dg-require-effective-target vect_early_break } */ > > > > +/* { dg-require-effective-target vect_int } */ > > > > + > > > > +/* { dg-additional-options "-Ofast" } */ > > > > + > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > > > + > > > > +char vect_a[1025]; > > > > +char vect_b[1025]; > > > > + > > > > +unsigned test4(char x, int n) > > > > +{ > > > > + unsigned ret = 0; > > > > + for (int i = 1; i < (n - 2); i+=2) > > > > + { > > > > + if (vect_a[i] > x || vect_a[i+1] > x) > > > > + return 1; > > > > + > > > > + vect_b[i] = x; > > > > + vect_b[i+1] = x+1; > > > > + } > > > > + return ret; > > > > +} > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c > > > > new file mode 100644 > > > > index > > > > 0000000000000000000000000000000000000000..31e209620925353948325 > > > 3efc17499a53d112894 > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c > > > > @@ -0,0 +1,28 @@ > > > > +/* { dg-add-options vect_early_break } */ > > > > +/* { dg-do compile } */ > > > > +/* { dg-require-effective-target vect_early_break } */ > > > > +/* { dg-require-effective-target vect_int } */ > > > > + > > > > +/* { dg-additional-options "-Ofast" } */ > > > > + > > > > +/* Group size is uneven, load lanes targets can't safely vectorize > > > > this. */ > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > > > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using > peeling" > > > "vect" } } */ > > > > + > > > > + > > > > +char vect_a[1025]; > > > > +char vect_b[1025]; > > > > + > > > > +unsigned test4(char x, int n) > > > > +{ > > > > + unsigned ret = 0; > > > > + for (int i = 1; i < (n - 2); i+=2) > > > > + { > > > > + if (vect_a[i-1] > x || vect_a[i+2] > x) > > > > + return 1; > > > > + > > > > + vect_b[i] = x; > > > > + vect_b[i+1] = x+1; > > > > + } > > > > + return ret; > > > > +} > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c > > > > index > > > > 9d3c6a5dffe3be4a7759b150e330d18144ab5ce5..b3f40b8c9ba49e41bd283e46 > > > a462238c3b5825ef 100644 > > > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c > > > > @@ -23,4 +23,5 @@ unsigned test4(unsigned x, unsigned n) > > > > return ret; > > > > } > > > > > > > > -/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" > > > > } } */ > > > > +/* cannot safely vectorize this due due to the group misalignment. */ > > > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" > > > > 0 "vect" > } } > > > */ > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c > > > > index > > > > a02d5986ba3cfc117b19305c5e96711299996931..d4fd0d39a25a5659e3d9452 > > > b79f3e0fabba8b3c0 100644 > > > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c > > > > @@ -2,6 +2,7 @@ > > > > /* { dg-do compile } */ > > > > /* { dg-require-effective-target vect_early_break } */ > > > > /* { dg-require-effective-target vect_int } */ > > > > +/* { dg-require-effective-target vect_partial_vectors } */ > > > > > > > > void abort (); > > > > int a[64], b[64]; > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c > > > > index > > > > 9096f66647c7b3cb430562d35f8ce076244f7c11..b35e737fa3b9137cd745c14f > > > 7ad915a3f81c38c4 100644 > > > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c > > > > @@ -4,6 +4,7 @@ > > > > /* { dg-require-effective-target vect_int } */ > > > > /* { dg-add-options bind_pic_locally } */ > > > > /* { dg-require-effective-target vect_early_break_hw } */ > > > > +/* { dg-require-effective-target vect_partial_vectors } */ > > > > > > > > #include <stdarg.h> > > > > #include "tree-vect.h" > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c > > > > index > > > > 319bd125c3156f13c300ff2b94d269bb9ec29e97..a4886654f152b2c0568286fe > > > bea2b31cb7be8499 100644 > > > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c > > > > @@ -5,8 +5,9 @@ > > > > > > > > /* { dg-additional-options "-Ofast" } */ > > > > > > > > -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > > > -/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */ > > > > +/* Multiple loads of different alignments, we can't peel this. */ > > > > +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ > > > > +/* { dg-final { scan-tree-dump-not "epilog loop required" "vect" } } */ > > > > > > > > void abort (); > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c > > > > index > > > > 8a8c076ba92ca6fef419cb23b457a23555c61c64..b58a4611d6b8d86f0247d9ea > > > 44ab4750473589a9 100644 > > > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c > > > > @@ -5,8 +5,9 @@ > > > > > > > > /* { dg-additional-options "-Ofast" } */ > > > > > > > > -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > > > -/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */ > > > > +/* Multiple loads with different misalignments. Can't peel need > > > > partial loop > > > support. */ > > > > +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ > > > > +/* { dg-final { scan-tree-dump-not "epilog loop required" "vect" } } */ > > > > void abort (); > > > > > > > > unsigned short sa[32]; > > > > diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc > > > > index > > > > 6d5854ac7c7a18e09ec7ad72c534abdc55cb6efa..3df48c8611c8fd843efb145c7c > > > 1a0afeee25edfa 100644 > > > > --- a/gcc/tree-vect-data-refs.cc > > > > +++ b/gcc/tree-vect-data-refs.cc > > > > @@ -731,7 +731,9 @@ vect_analyze_early_break_dependences > (loop_vec_info > > > loop_vinfo) > > > > if (is_gimple_debug (stmt)) > > > > continue; > > > > > > > > - stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt); > > > > + stmt_vec_info stmt_vinfo > > > > + = vect_stmt_to_vectorize (loop_vinfo->lookup_stmt (stmt)); > > > > + stmt = STMT_VINFO_STMT (stmt_vinfo); > > > > auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo); > > > > if (!dr_ref) > > > > continue; > > > > @@ -748,26 +750,16 @@ vect_analyze_early_break_dependences > > > (loop_vec_info loop_vinfo) > > > > bounded by VF so accesses are within range. We only need > > > > to check > > > > the reads since writes are moved to a safe place where if > > > > we get > > > > there we know they are safe to perform. */ > > > > - if (DR_IS_READ (dr_ref) > > > > - && !ref_within_array_bound (stmt, DR_REF (dr_ref))) > > > > + if (DR_IS_READ (dr_ref)) > > > > { > > > > - if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo) > > > > - || STMT_VINFO_STRIDED_P (stmt_vinfo)) > > > > - { > > > > - const char *msg > > > > - = "early break not supported: cannot peel " > > > > - "for alignment, vectorization would read out of " > > > > - "bounds at %G"; > > > > - return opt_result::failure_at (stmt, msg, stmt); > > > > - } > > > > - > > > > - dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_vinfo); > > > > - dr_info->need_peeling_for_alignment = true; > > > > + dr_set_safe_speculative_read_required (stmt_vinfo, true); > > > > + bool inbounds = ref_within_array_bound (stmt, DR_REF > > > > (dr_ref)); > > > > + DR_SCALAR_KNOWN_BOUNDS (STMT_VINFO_DR_INFO (stmt_vinfo)) > > > = inbounds; > > > > > > > > if (dump_enabled_p ()) > > > > dump_printf_loc (MSG_NOTE, vect_location, > > > > - "marking DR (read) as needing peeling > > > > for " > > > > - "alignment at %G", stmt); > > > > + "marking DR (read) as possibly needing > > > > peeling " > > > > + "for alignment at %G", stmt); > > > > } > > > > > > > > if (DR_IS_READ (dr_ref)) > > > > @@ -1326,9 +1318,6 @@ vect_record_base_alignments (vec_info *vinfo) > > > > Compute the misalignment of the data reference DR_INFO when > > > > vectorizing > > > > with VECTYPE. > > > > > > > > - RESULT is non-NULL iff VINFO is a loop_vec_info. In that case, > > > > *RESULT will > > > > - be set appropriately on failure (but is otherwise left unchanged). > > > > - > > > > Output: > > > > 1. initialized misalignment info for DR_INFO > > > > > > > > @@ -1337,7 +1326,7 @@ vect_record_base_alignments (vec_info *vinfo) > > > > > > > > static void > > > > vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info, > > > > - tree vectype, opt_result *result = > > > > nullptr) > > > > + tree vectype) > > > > { > > > > stmt_vec_info stmt_info = dr_info->stmt; > > > > vec_base_alignments *base_alignments = &vinfo->base_alignments; > > > > @@ -1365,63 +1354,29 @@ vect_compute_data_ref_alignment (vec_info > > > *vinfo, dr_vec_info *dr_info, > > > > = exact_div (targetm.vectorize.preferred_vector_alignment > > > > (vectype), > > > > BITS_PER_UNIT); > > > > > > > > - /* If this DR needs peeling for alignment for correctness, we must > > > > - ensure the target alignment is a constant power-of-two multiple > > > > of the > > > > - amount read per vector iteration (overriding the above hook where > > > > - necessary). */ > > > > - if (dr_info->need_peeling_for_alignment) > > > > + if (loop_vinfo > > > > + && dr_safe_speculative_read_required (stmt_info)) > > > > { > > > > - /* Vector size in bytes. */ > > > > - poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT > (vectype)); > > > > - > > > > - /* We can only peel for loops, of course. */ > > > > - gcc_checking_assert (loop_vinfo); > > > > - > > > > - /* Calculate the number of vectors read per vector iteration. If > > > > - it is a power of two, multiply through to get the required > > > > - alignment in bytes. Otherwise, fail analysis since alignment > > > > - peeling wouldn't work in such a case. */ > > > > - poly_uint64 num_scalars = LOOP_VINFO_VECT_FACTOR (loop_vinfo); > > > > + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); > > > > + auto vectype_size > > > > + = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype))); > > > > + poly_uint64 new_alignment = vf * vectype_size; > > > > + /* If we have a grouped access we require that the alignment be > > > > N * elem. > */ > > > > if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) > > > > - num_scalars *= DR_GROUP_SIZE (stmt_info); > > > > + new_alignment *= DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT > > > (stmt_info)); > > > > - auto num_vectors = vect_get_num_vectors (num_scalars, vectype); > > > > - if (!pow2p_hwi (num_vectors)) > > > > - { > > > > - *result = opt_result::failure_at (vect_location, > > > > - "non-power-of-two num > > > > vectors %u " > > > > - "for DR needing peeling for > > > > " > > > > - "alignment at %G", > > > > - num_vectors, > > > > stmt_info->stmt); > > > > - return; > > > > - } > > > > - > > > > - safe_align *= num_vectors; > > > > - if (maybe_gt (safe_align, 4096U)) > > > > - { > > > > - pretty_printer pp; > > > > - pp_wide_integer (&pp, safe_align); > > > > - *result = opt_result::failure_at (vect_location, > > > > - "alignment required for > > > > correctness" > > > > - " (%s) may exceed page > > > > size", > > > > - pp_formatted_text (&pp)); > > > > - return; > > > > - } > > > > - > > > > - unsigned HOST_WIDE_INT multiple; > > > > - if (!constant_multiple_p (vector_alignment, safe_align, > > > > &multiple) > > > > - || !pow2p_hwi (multiple)) > > > > + unsigned HOST_WIDE_INT target_alignment; > > > > + if (new_alignment.is_constant (&target_alignment) > > > > + && pow2p_hwi (target_alignment)) > > > > { > > > > if (dump_enabled_p ()) > > > > { > > > > dump_printf_loc (MSG_NOTE, vect_location, > > > > - "forcing alignment for DR from preferred > > > > ("); > > > > - dump_dec (MSG_NOTE, vector_alignment); > > > > - dump_printf (MSG_NOTE, ") to safe align ("); > > > > - dump_dec (MSG_NOTE, safe_align); > > > > - dump_printf (MSG_NOTE, ") for stmt: %G", stmt_info->stmt); > > > > + "alignment increased due to early break > > > > to "); > > > > + dump_dec (MSG_NOTE, new_alignment); > > > > + dump_printf (MSG_NOTE, " bytes.\n"); > > > > } > > > > - vector_alignment = safe_align; > > > > + vector_alignment = target_alignment; > > > > } > > > > } > > > > > > > > @@ -2487,6 +2442,8 @@ vect_enhance_data_refs_alignment > (loop_vec_info > > > loop_vinfo) > > > > || !slpeel_can_duplicate_loop_p (loop, LOOP_VINFO_IV_EXIT > (loop_vinfo), > > > > loop_preheader_edge (loop)) > > > > || loop->inner > > > > + /* We don't currently maintaing the LCSSA for prologue peeled > > > > inversed > > > > + loops. */ > > > > || LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo)) > > > > do_peeling = false; > > > > > > > > @@ -2950,12 +2907,9 @@ vect_analyze_data_refs_alignment > (loop_vec_info > > > loop_vinfo) > > > > if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt) > > > > && DR_GROUP_FIRST_ELEMENT (dr_info->stmt) != > > > > dr_info->stmt) > > > > continue; > > > > - opt_result res = opt_result::success (); > > > > + > > > > vect_compute_data_ref_alignment (loop_vinfo, dr_info, > > > > - STMT_VINFO_VECTYPE > > > > (dr_info->stmt), > > > > - &res); > > > > - if (!res) > > > > - return res; > > > > + STMT_VINFO_VECTYPE > > > > (dr_info->stmt)); > > > > } > > > > } > > > > > > > > @@ -7219,7 +7173,7 @@ vect_supportable_dr_alignment (vec_info *vinfo, > > > dr_vec_info *dr_info, > > > > > > > > if (misalignment == 0) > > > > return dr_aligned; > > > > - else if (dr_info->need_peeling_for_alignment) > > > > + else if (dr_safe_speculative_read_required(stmt_info)) > > > > > > space after dr_safe_speculative_read_required > > > > > > Depending on vectorizable_load this might be a bit restrictive (it > > > makes peeling do the magic, but we could check > > > dr_safe_speculative_read_required there as well). But it's OK for > > > now at least. > > > > > > > return dr_unaligned_unsupported; > > > > > > > > /* For now assume all conditional loads/stores support unaligned > > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > > > > index > > > > 6bbb16beff2c627fca11a7403ba5ee3a5faa21c1..e0e5598babc7b119d408d8351 > > > 2ae8c6047d4924c 100644 > > > > --- a/gcc/tree-vect-stmts.cc > > > > +++ b/gcc/tree-vect-stmts.cc > > > > @@ -2597,6 +2597,165 @@ get_load_store_type (vec_info *vinfo, > > > stmt_vec_info stmt_info, > > > > return false; > > > > } > > > > > > > > + /* If this DR needs alignment for correctness, we must ensure the > > > > target > > > > + alignment is a constant power-of-two multiple of the amount read > > > > per > > > > + vector iteration or force masking. */ > > > > + if (dr_safe_speculative_read_required (stmt_info)) > > > > + { > > > > + /* We can only peel for loops, of course. */ > > > > + gcc_checking_assert (loop_vinfo); > > > > + > > > > + /* Check if we support the operation if early breaks are needed. > > > > Here we > > > > + must ensure that we don't access any more than the scalar code > > > > would > > > > + have. A masked operation would ensure this, so for these load > > > > types > > > > + force masking. */ > > > > + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo) > > > > + && (*memory_access_type == VMAT_GATHER_SCATTER > > > > + || *memory_access_type == VMAT_STRIDED_SLP)) > > > > + { > > > > + if (dump_enabled_p ()) > > > > + dump_printf_loc (MSG_NOTE, vect_location, > > > > + "early break not supported: cannot peel > > > > for " > > > > + "alignment. With non-contiguous memory > > > vectorization" > > > > + " could read out of bounds at %G ", > > > > + STMT_VINFO_STMT (stmt_info)); > > > > + LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = true; > > > > + } > > > > + > > > > + /* Vector size in bytes. */ > > > > + poly_uint64 safe_align; > > > > + if (nunits.is_constant ()) > > > > + safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT (vectype)); > > > > + else > > > > + safe_align = estimated_poly_value (LOOP_VINFO_VECT_FACTOR > > > (loop_vinfo), > > > > + POLY_VALUE_MAX); > > > > + > > > > + auto num_vectors = ncopies; > > > > > > You can't rely on ncopies for SLP, so in practice you'll see > > > num_vectors == 1 always (unless you force --param vect-force-slp=0). > > > I'm not sure why you need this test given you have the target alignment > > > check below? > > > > Yeah for SLP it doesn't matter, but as you said the param still exists and > > I don't > > think we've introduced any SLP only code yet have we? > > > > If you prefer I can drop it. What I don't actually remember for the non-SLP > > case > > is if the VF accounts for the ncopies growth as well? If so I agree this is > > completely > > superfluous for SLP and non-SLP. > > VF and GROUP_SIZE are correct, independent of whether we use SLP or > not, it's just that stupid ncopies vs. vec_num stuff that adds to > confusion. > > So yes, please drop it. > > > > > > > > + if (!pow2p_hwi (num_vectors)) > > > > + { > > > > + if (dump_enabled_p ()) > > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > > > + "non-power-of-two num vectors %u " > > > > + "for DR needing peeling for " > > > > + "alignment at %G", > > > > + num_vectors, STMT_VINFO_STMT (stmt_info)); > > > > + return false; > > > > + } > > > > + > > > > + auto target_alignment > > > > + = DR_TARGET_ALIGNMENT (STMT_VINFO_DR_INFO (stmt_info)); > > > > + unsigned HOST_WIDE_INT target_align; > > > > + bool inbounds > > > > + = DR_SCALAR_KNOWN_BOUNDS (STMT_VINFO_DR_INFO (stmt_info)); > > > > + > > > > + /* If the scalar loop is known to be in bounds, and we're using > > > > scalar > > > > + accesses then there's no need to check further. */ > > > > + if (inbounds > > > > + && *memory_access_type == VMAT_ELEMENTWISE) > > > > + { > > > > + *alignment_support_scheme = dr_aligned; > > > > + return true; > > > > + } > > > > + > > > > + bool group_aligned = false; > > > > + if (target_alignment.is_constant (&target_align) > > > > + && nunits.is_constant ()) > > > > + { > > > > + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); > > > > + auto vectype_size > > > > + = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype))); > > > > + poly_uint64 required_alignment = vf * vectype_size; > > > > + /* If we have a grouped access we require that the alignment > > > > be N * elem. > > > */ > > > > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) > > > > + required_alignment *= > > > > + DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (stmt_info)); > > > > + if (maybe_ne (target_alignment, required_alignment)) > > > > > > Note ->target_alignment only holds if *alignment_support_scheme == > > > dr_aligned but you don't seem to test this? It would be OK to > > > have target_alignment be a multiple of required_alignment. > > > > > > > ACK, will change. > > > > > > + { > > > > + if (dump_enabled_p ()) > > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > > > + "desired alignment %wu not met. Instead > > > > got %wu " > > > > + "needed for DR alignment at %G", > > > > > > 'needed' sounds odd, omit it? > > > > > > > + required_alignment.to_constant (), > > > > + target_align, STMT_VINFO_STMT (stmt_info)); > > > > + return false; > > > > + } > > > > + > > > > + if (!pow2p_hwi (target_align)) > > > > + { > > > > + if (dump_enabled_p ()) > > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > > > + "non-power-of-two vector alignment %wd " > > > > + "needed for DR alignment at %G", > > > > > > likewise. > > > > > > > + target_align, STMT_VINFO_STMT (stmt_info)); > > > > + return false; > > > > + } > > > > + group_aligned = true; > > > > + } > > > > + else if (inbounds) > > > > > > so you have those early outs (return false) above, but here is a fallback? > > > Shouldn't this then be not an else if but > > > > > > if (!group_aligned && inbounds) > > > > > > ? > > > > Ack, so you mean that those cases could still be handled by partial vectors. > > Hmm that's true. > > > > > > > > > + LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = true; > > > > + else > > > > > > and this else if (!group_aligned)? > > > > > > > + return false; > > > > + > > > > + /* Even if uneven group sizes are aligned on the first load, the > > > > second > > > > + iteration won't be. As such reject non-power of 2 group > > > > sizes. */ > > > > > > isn't this covered above? > > > > Ack, the group_size * above on the power of two alignment check should cover > it. > > > > > > > > > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) > > > > + { > > > > + auto group_size = DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT > > > (stmt_info)); > > > > + if (!pow2p_hwi (group_size)) > > > > + { > > > > + if (group_aligned || inbounds) > > > > + LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = > > > true; > > > > + else > > > > + { > > > > + if (dump_enabled_p ()) > > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, > > > > vect_location, > > > > + "early break not supported: uneven group > > > > size, " > > > > + "vectorization could read out of bounds at > > > > %G ", > > > > + STMT_VINFO_STMT (stmt_info)); > > > > + return false; > > > > + } > > > > + } > > > > + } > > > > + > > > > + safe_align *= num_vectors; > > > > + /* For VLA we have to insert a runtime check that the vector > > > > loads > > > > + per iterations don't exceed a page size. For now we can use > > > > + POLY_VALUE_MAX as a proxy as we can't peel for VLA. */ > > > > > > And this should be a check on required_alignment computed above, > > > resulting in group_aligned = false but still handling it in the > > > inbounds case? > > > > > > > ACK, since if larger than a pagesize it can be handled with partial > > loops for the inbound case. and I guess out of bound case should be > > handled through partial vectors. > > > > > > + if (maybe_gt (safe_align, (unsigned)param_min_pagesize)) > > > > + { > > > > + if (dump_enabled_p ()) > > > > + { > > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > > > + "alignment required for correctness ("); > > > > + dump_dec (MSG_MISSED_OPTIMIZATION, safe_align); > > > > + dump_printf (MSG_NOTE, ") may exceed page size\n"); > > > > + } > > > > + return false; > > > > + } > > > > + > > > > + /* There are multiple loads that have a misalignment that we > > > > couldn't > > > > + align. We would need LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P to > > > > + vectorize. */ > > > > + if (!STMT_VINFO_GROUPED_ACCESS (stmt_info) > > > > + && *misalignment != 0) > > > > > > that's probably the missing dr_aligned check? > > > > > > So, what I'm confused about is the following: we set ->target_alignment > > > (if possible) to sth that peeling can ensure the accesses will be > > > aligned - this then results in dr_aligned. Here we verify if the > > > access is aligned and aligned enough. > > > > > > Now as said above, that we end up with dr_unaligned_unsupported is > > > off, so we probably shouldn't use that to indicate the higher > > > desired alignment to the peeling code. > > > > > > > This is because of cases like in > > /work/gcc/gcc/testsuite/gcc.dg/vect/vect-early- > break_56.c > > > > Where > > > > /* check results: */ > > for (i = 0; i < n; i++) > > { > > if (sa[i+7] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i]) > > abort (); > > } > > > > Where there is no group loads, but sa and ia have different misalignments > > and > > peeling can't do anything here as there's no way to peel them to a mutual > alignment > > (it's even not something we even try). > > > > So this is to reject these cases. If we get to vectorizable_load with any > misalignment > > still set, it means we had multiple known misalignments which we can't > > vectorize. > > > > So explicitly, these cases *won't* be dr_aligned but will be marked as > > requiring > > alignment. > > Yes, but this should have been covered above already as well if you'd > actually checked for dr_aligned instead of just matching up > target_alignment (that wouldn't hold in this case) and required_alignment. > > > > > + return false; > > > > + > > > > + /* When using a group access the first element may be aligned > > > > but the > > > > + subsequent loads may not be. For LOAD_LANES since the loads > > > > are based > > > > + on the first DR then all loads in the group are aligned. For > > > > + non-LOAD_LANES this is not the case. In particular a load + > > > > blend when > > > > + there are gaps can have the non first loads issued unaligned, > > > > even > > > > + partially overlapping the memory of the first load in order to > > > > simplify > > > > + the blend. This is what the x86_64 backend does for instance. > > > > As > > > > + such only the first load in the group is aligned, the rest are > > > > not. */ > > > > + else if (STMT_VINFO_GROUPED_ACCESS (stmt_info) > > > > + && *misalignment != 0 > > > > + && *memory_access_type != VMAT_LOAD_STORE_LANES) > > > > + *alignment_support_scheme = dr_unaligned_supported; > > > > > > this looks definitely bogus, some targets cannot do unaligned accesses. > > > > > > > This is because of cases like > > /work/gcc/gcc/testsuite/gcc.dg/vect/vect-early- > break_26.c: > > > > for (int i = 2; i < 64; i+=2) > > if (b[i] != a[i] - a[i-2] > > || b[i+1] != a[i+1] - a[i-1]) > > abort (); > > > > Where we have a group load with a with a gap. On x86_64 this results in two > loads > > + a blend. But the second load isn't aligned. > > This should be either covered by the above checks already or if only > one load arrives here it's the fault of the downstream code splitting > the aligned load into something that isn't valid to speculate. > > > I guess as we put all the alignment requirements on the group leader, that > perhaps > > this code in get_load_store_type should be gated on > > > > STMT_VINFO_GROUPED_ACCESS (stmt_info) > > && DR_GROUP_FIRST_ELEMENT (stmt_info) == stmt_info > > > > ? > > No, I don't think so. The code that eventually performs a > contiguous sub-group access directly should never extend > the load beyond GROUP_SIZE - or should be gated on the DR > not executed speculatively. That is, we should "fix" this > elsewhere. >
It doesn't, it's just not aligned within the range of GROUP_SIZE from what I remember. > If you have an updated patch I can look at what's wrong here if you > tell me how to reproduce (after applying the patch I suppose). Yes, applying the patch and running: /work/build/gcc/xgcc -B/work/build/gcc/ /work/gcc/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c -m64 -fdiagnostics-plain-output -flto -ffat-lto-objects -msse2 -ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -msse4.1 -lm -o ./vect-early-break_26.exe Thanks, Tamar > > > Enforcing the alignment on every group member would be wrong I think since > > that ends up higher overall alignment than they need. > > > > > So besides these issues in get_load_store_type the change looks good now. > > > > > > > Thanks for the reviews. > > > > Tamar > > > Richard. > > > > > > > + else > > > > + *alignment_support_scheme = dr_aligned; > > > > + } > > > > + > > > > if (*alignment_support_scheme == dr_unaligned_unsupported) > > > > { > > > > if (dump_enabled_p ()) > > > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h > > > > index > > > > b0cb081cba0ae8b11fbfcfcb8c6d440ec451ccb5..97caf61b345735d297ec49fd6ca > > > 64797435b46fc 100644 > > > > --- a/gcc/tree-vectorizer.h > > > > +++ b/gcc/tree-vectorizer.h > > > > @@ -1281,7 +1281,11 @@ public: > > > > > > > > /* Set by early break vectorization when this DR needs peeling for > > > > alignment > > > > for correctness. */ > > > > - bool need_peeling_for_alignment; > > > > + bool safe_speculative_read_required; > > > > + > > > > + /* Set by early break vectorization when this DR's scalar accesses > > > > are known > > > > + to be inbounds of a known bounds loop. */ > > > > + bool scalar_access_known_in_bounds; > > > > > > > > tree base_decl; > > > > > > > > @@ -1997,6 +2001,35 @@ dr_target_alignment (dr_vec_info *dr_info) > > > > return dr_info->target_alignment; > > > > } > > > > #define DR_TARGET_ALIGNMENT(DR) dr_target_alignment (DR) > > > > +#define DR_SCALAR_KNOWN_BOUNDS(DR) (DR)- > > > >scalar_access_known_in_bounds > > > > + > > > > +/* Return if the stmt_vec_info requires peeling for alignment. */ > > > > +inline bool > > > > +dr_safe_speculative_read_required (stmt_vec_info stmt_info) > > > > +{ > > > > + dr_vec_info *dr_info; > > > > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) > > > > + dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT > (stmt_info)); > > > > + else > > > > + dr_info = STMT_VINFO_DR_INFO (stmt_info); > > > > + > > > > + return dr_info->safe_speculative_read_required; > > > > +} > > > > + > > > > +/* Set the safe_speculative_read_required for the the stmt_vec_info, > > > > if group > > > > + access then set on the fist element otherwise set on DR directly. > > > > */ > > > > +inline void > > > > +dr_set_safe_speculative_read_required (stmt_vec_info stmt_info, > > > > + bool requires_alignment) > > > > +{ > > > > + dr_vec_info *dr_info; > > > > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) > > > > + dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT > (stmt_info)); > > > > + else > > > > + dr_info = STMT_VINFO_DR_INFO (stmt_info); > > > > + > > > > + dr_info->safe_speculative_read_required = requires_alignment; > > > > +} > > > > > > > > inline void > > > > set_dr_target_alignment (dr_vec_info *dr_info, poly_uint64 val) > > > > > > > > > > > > > > > > > > -- > > > Richard Biener <rguent...@suse.de> > > > SUSE Software Solutions Germany GmbH, > > > Frankenstrasse 146, 90461 Nuernberg, Germany; > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG > Nuernberg) > > > > -- > Richard Biener <rguent...@suse.de> > SUSE Software Solutions Germany GmbH, > Frankenstrasse 146, 90461 Nuernberg, Germany; > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)