On Wed, 26 Feb 2025, Tamar Christina wrote: > > -----Original Message----- > > From: Richard Biener <rguent...@suse.de> > > Sent: Wednesday, February 26, 2025 12:30 PM > > To: Tamar Christina <tamar.christ...@arm.com> > > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com> > > Subject: Re: [3/3 PATCH v3]middle-end: delay checking for alignment to load > > [PR118464] > > > > On Tue, 25 Feb 2025, Tamar Christina wrote: > > > > > Hi All, > > > > > > This fixes two PRs on Early break vectorization by delaying the safety > > > checks to > > > vectorizable_load when the VF, VMAT and vectype are all known. > > > > > > This patch does add two new restrictions: > > > > > > 1. On LOAD_LANES targets, where the buffer size is known, we reject uneven > > > group sizes, as they are unaligned every n % 2 iterations and so may > > > cross > > > a page unwittingly. > > > > > > 2. On LOAD_LANES targets when the buffer is unknown, we reject > > > vectorization > > if > > > we cannot peel for alignment, as the alignment requirement is quite > > > large at > > > GROUP_SIZE * vectype_size. This is unlikely to ever be beneficial so > > > we > > > don't support it for now. > > > > > > There are other steps documented inside the code itself so that the > > > reasoning > > > is next to the code. > > > > > > Note that for VLA I have still left this fully disabled when not working > > > on a > > > fixed buffer. > > > > > > For VLA targets like SVE return element alignment as the desired vector > > > alignment. This means that the loads are never misaligned and so > > > annoying it > > > won't ever need to peel. > > > > > > So what I think needs to happen in GCC 16 is that. > > > > > > 1. during vect_compute_data_ref_alignment we need to take the max of > > > POLY_VALUE_MIN and vector_alignment. > > > > > > 2. vect_do_peeling define skip_vector when PFA for VLA, and in the guard > > > add a > > > check that ncopies * vectype does not exceed POLY_VALUE_MAX which we > > > use > > as a > > > proxy for pagesize. > > > > > > 3. Force LOOP_VINFO_USING_PARTIAL_VECTORS_P to be true in > > > vect_determine_partial_vectors_and_peeling since the first iteration > > > has to > > > be partial. If LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P we have to fail to > > > vectorize. > > > > > > 4. Create a default mask to be used, so that > > vect_use_loop_mask_for_alignment_p > > > becomes true and we generate the peeled check through loop control for > > > partial loops. From what I can tell this won't work for > > > LOOP_VINFO_FULLY_WITH_LENGTH_P since they don't have any peeling > > support at > > > all in the compiler. That would need to be done independently from the > > > above. > > > > > > In any case, not GCC 15 material so I've kept the WIP patches I have > > downstream. > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, > > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu > > > -m32, -m64 and no issues. > > > > > > Ok for master? > > > > > > Thanks, > > > Tamar > > > > > > gcc/ChangeLog: > > > > > > PR tree-optimization/118464 > > > PR tree-optimization/116855 > > > * doc/invoke.texi (min-pagesize): Update docs with vectorizer use. > > > * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay > > > checks. > > > (vect_compute_data_ref_alignment): Remove alignment checks and move > > to > > > get_load_store_type, increase group access alignment. > > > (vect_enhance_data_refs_alignment): Add note to comment needing > > > investigating. > > > (vect_analyze_data_refs_alignment): Likewise. > > > (vect_supportable_dr_alignment): For group loads look at first DR. > > > * tree-vect-stmts.cc (get_load_store_type): > > > Perform safety checks for early break pfa. > > > * tree-vectorizer.h (dr_set_safe_speculative_read_required, > > > dr_safe_speculative_read_required, DR_SCALAR_KNOWN_BOUNDS): > > New. > > > (need_peeling_for_alignment): Renamed to... > > > (safe_speculative_read_required): .. This > > > (class dr_vec_info): Add scalar_access_known_in_bounds. > > > > > > gcc/testsuite/ChangeLog: > > > > > > PR tree-optimization/118464 > > > PR tree-optimization/116855 > > > * gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the > > > load type is relaxed later. > > > * gcc.dg/vect/vect-early-break_121-pr114081.c: Update. > > > * gcc.dg/vect/vect-early-break_22.c: Reject for load_lanes targets > > > * g++.dg/vect/vect-early-break_7-pr118464.cc: New test. > > > * gcc.dg/vect/vect-early-break_132-pr118464.c: New test. > > > * gcc.dg/vect/vect-early-break_133_pfa1.c: New test. > > > * gcc.dg/vect/vect-early-break_133_pfa10.c: New test. > > > * gcc.dg/vect/vect-early-break_133_pfa2.c: New test. > > > * gcc.dg/vect/vect-early-break_133_pfa3.c: New test. > > > * gcc.dg/vect/vect-early-break_133_pfa4.c: New test. > > > * gcc.dg/vect/vect-early-break_133_pfa5.c: New test. > > > * gcc.dg/vect/vect-early-break_133_pfa6.c: New test. > > > * gcc.dg/vect/vect-early-break_133_pfa7.c: New test. > > > * gcc.dg/vect/vect-early-break_133_pfa8.c: New test. > > > * gcc.dg/vect/vect-early-break_133_pfa9.c: New test. > > > * gcc.dg/vect/vect-early-break_39.c: Update testcase for misalignment. > > > * gcc.dg/vect/vect-early-break_53.c: Likewise. > > > * gcc.dg/vect/vect-early-break_56.c: Likewise. > > > * gcc.dg/vect/vect-early-break_57.c: Likewise. > > > * gcc.dg/vect/vect-early-break_81.c: Likewise. > > > > > > --- > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > > > index > > ca8e468f3f2dbf68c959f74d8fec48c79463504d..fff2874b326d605fc2656adf4ab > > 6eb5bd5d42d71 100644 > > > --- a/gcc/doc/invoke.texi > > > +++ b/gcc/doc/invoke.texi > > > @@ -17256,7 +17256,7 @@ Maximum number of relations the oracle will > > register in a basic block. > > > Work bound when discovering transitive relations from existing relations. > > > > > > @item min-pagesize > > > -Minimum page size for warning purposes. > > > +Minimum page size for warning and early break vectorization purposes. > > > > > > @item openacc-kernels > > > Specify mode of OpenACC `kernels' constructs handling. > > > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c > > b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c > > > index > > 9ef1330b47c817e16baaafa44c2b15108b9dd3a9..4c8255895b976653228233d > > 93c950629f3231554 100644 > > > --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c > > > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c > > > @@ -55,7 +55,9 @@ int main() > > > } > > > } > > > rephase (); > > > +#pragma GCC novector > > > for (i = 0; i < 32; ++i) > > > +#pragma GCC novector > > > for (j = 0; j < 3; ++j) > > > #pragma GCC novector > > > for (k = 0; k < 3; ++k) > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c > > > index > > 423ff0b566b18bf04ce4f67a45b94dc1a021a4a0..f99c57be0adc4d49035b8a75c > > 72d4a5b04cc05c7 100644 > > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c > > > @@ -5,7 +5,8 @@ > > > /* { dg-additional-options "-O3" } */ > > > /* { dg-additional-options "-mavx2" { target { x86_64-*-* i?86-*-* } } } > > > */ > > > > > > -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > > +/* Arm and -m32 create a group size of 3 here, which we can't support > > > yet. > > AArch64 makes elementwise accesses here. */ > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { > > > aarch64*-*- > > * } } } } */ > > > > > > typedef struct filter_list_entry { > > > const char *name; > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c > > > new file mode 100644 > > > index > > 0000000000000000000000000000000000000000..9bf0cbc8853f74de550e8a > > c83ab569fc9fbde126 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c > > > @@ -0,0 +1,25 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-add-options vect_early_break } */ > > > +/* { dg-require-effective-target vect_early_break } */ > > > +/* { dg-require-effective-target vect_int } */ > > > +/* { dg-additional-options "-O3" } */ > > > + > > > +int a, b, c, d, e, f; > > > +short g[1]; > > > +int main() { > > > + int h; > > > + while (a) { > > > + while (h) > > > + ; > > > + for (b = 2; b; b--) { > > > + while (c) > > > + ; > > > + f = g[a]; > > > + if (d) > > > + break; > > > + } > > > + while (e) > > > + ; > > > + } > > > + return 0; > > > +} > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c > > > new file mode 100644 > > > index > > 0000000000000000000000000000000000000000..dc771186efafe25bb65490 > > da7a383ad7f6ceb0a7 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c > > > @@ -0,0 +1,19 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-add-options vect_early_break } */ > > > +/* { dg-require-effective-target vect_early_break } */ > > > +/* { dg-require-effective-target vect_int } */ > > > +/* { dg-additional-options "-O3" } */ > > > + > > > +char string[1020]; > > > + > > > +char * find(int n, char c) > > > +{ > > > + for (int i = 1; i < n; i++) { > > > + if (string[i] == c) > > > + return &string[i]; > > > + } > > > + return 0; > > > +} > > > + > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > > +/* { dg-final { scan-tree-dump "Alignment of access forced using > > > peeling" "vect" > > } } */ > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c > > > new file mode 100644 > > > index > > 0000000000000000000000000000000000000000..82d473a279ce060c55028 > > 9c61729d9f9b56f0d2a > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c > > > @@ -0,0 +1,24 @@ > > > +/* { dg-add-options vect_early_break } */ > > > +/* { dg-do compile } */ > > > +/* { dg-require-effective-target vect_early_break } */ > > > +/* { dg-require-effective-target vect_int } */ > > > + > > > +/* { dg-additional-options "-Ofast" } */ > > > + > > > +/* Alignment requirement too big, load lanes targets can't safely > > > vectorize this. > > */ > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! > > vect_load_lanes } } } } */ > > > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using > > > peeling" > > "vect" { target { ! vect_load_lanes } } } } */ > > > + > > > +unsigned test4(char x, char *restrict vect_a, char *restrict vect_b, int > > > n) > > > +{ > > > + unsigned ret = 0; > > > + for (int i = 0; i < (n - 2); i+=2) > > > + { > > > + if (vect_a[i] > x || vect_a[i+2] > x) > > > + return 1; > > > + > > > + vect_b[i] = x; > > > + vect_b[i+1] = x+1; > > > + } > > > + return ret; > > > +} > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c > > > new file mode 100644 > > > index > > 0000000000000000000000000000000000000000..7d56772fbf380ce42ac758 > > ca29a5f3f9d3f6e0d1 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c > > > @@ -0,0 +1,19 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-add-options vect_early_break } */ > > > +/* { dg-require-effective-target vect_early_break } */ > > > +/* { dg-require-effective-target vect_int } */ > > > +/* { dg-additional-options "-O3" } */ > > > + > > > +char string[1020]; > > > + > > > +char * find(int n, char c) > > > +{ > > > + for (int i = 0; i < n; i++) { > > > + if (string[i] == c) > > > + return &string[i]; > > > + } > > > + return 0; > > > +} > > > + > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using > > > peeling" > > "vect" } } */ > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c > > > new file mode 100644 > > > index > > 0000000000000000000000000000000000000000..374a051b945e97eedb9be > > 9da423cf54b5e564d6f > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c > > > @@ -0,0 +1,20 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-add-options vect_early_break } */ > > > +/* { dg-require-effective-target vect_early_break } */ > > > +/* { dg-require-effective-target vect_int } */ > > > +/* { dg-additional-options "-O3" } */ > > > + > > > +char string[1020] __attribute__((aligned(1))); > > > + > > > +char * find(int n, char c) > > > +{ > > > + for (int i = 1; i < n; i++) { > > > + if (string[i] == c) > > > + return &string[i]; > > > + } > > > + return 0; > > > +} > > > + > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > > +/* { dg-final { scan-tree-dump "Alignment of access forced using > > > peeling" "vect" > > } } */ > > > +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */ > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c > > > new file mode 100644 > > > index > > 0000000000000000000000000000000000000000..297fb7e9b9beffa25ab8f25 > > 7ceea1c065fcc6ae9 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c > > > @@ -0,0 +1,20 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-add-options vect_early_break } */ > > > +/* { dg-require-effective-target vect_early_break } */ > > > +/* { dg-require-effective-target vect_int } */ > > > +/* { dg-additional-options "-O3" } */ > > > + > > > +char string[1020] __attribute__((aligned(1))); > > > + > > > +char * find(int n, char c) > > > +{ > > > + for (int i = 0; i < n; i++) { > > > + if (string[i] == c) > > > + return &string[i]; > > > + } > > > + return 0; > > > +} > > > + > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using > > > peeling" > > "vect" } } */ > > > +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */ > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c > > > new file mode 100644 > > > index > > 0000000000000000000000000000000000000000..ca95be44e92e32769da1d > > 1e9b740ae54682a3d55 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c > > > @@ -0,0 +1,23 @@ > > > +/* { dg-add-options vect_early_break } */ > > > +/* { dg-do compile } */ > > > +/* { dg-require-effective-target vect_early_break } */ > > > +/* { dg-require-effective-target vect_int } */ > > > + > > > +/* { dg-additional-options "-Ofast" } */ > > > + > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > > + > > > +unsigned test4(char x, char *vect, int n) > > > +{ > > > + unsigned ret = 0; > > > + for (int i = 0; i < n; i++) > > > + { > > > + if (vect[i] > x) > > > + return 1; > > > + > > > + vect[i] = x; > > > + } > > > + return ret; > > > +} > > > + > > > +/* { dg-final { scan-tree-dump "Alignment of access forced using > > > peeling" "vect" > > } } */ > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c > > > new file mode 100644 > > > index > > 0000000000000000000000000000000000000000..ee123df6ed2ba97e92307c > > 64a61c97b1b6268743 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c > > > @@ -0,0 +1,23 @@ > > > +/* { dg-add-options vect_early_break } */ > > > +/* { dg-do compile } */ > > > +/* { dg-require-effective-target vect_early_break } */ > > > +/* { dg-require-effective-target vect_int } */ > > > + > > > +/* { dg-additional-options "-Ofast" } */ > > > + > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > > + > > > +unsigned test4(char x, char *vect_a, char *vect_b, int n) > > > +{ > > > + unsigned ret = 0; > > > + for (int i = 1; i < n; i++) > > > + { > > > + if (vect_a[i] > x || vect_b[i] > x) > > > + return 1; > > > + > > > + vect_a[i] = x; > > > + } > > > + return ret; > > > +} > > > + > > > +/* { dg-final { scan-tree-dump "Versioning for alignment will be > > > applied" "vect" } > > } */ > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c > > > new file mode 100644 > > > index > > 0000000000000000000000000000000000000000..51bad4e745b67cfdaad20f > > 50776299531824ce9c > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c > > > @@ -0,0 +1,23 @@ > > > +/* { dg-add-options vect_early_break } */ > > > +/* { dg-do compile } */ > > > +/* { dg-require-effective-target vect_early_break } */ > > > +/* { dg-require-effective-target vect_int } */ > > > + > > > +/* { dg-additional-options "-Ofast" } */ > > > + > > > +/* This should be vectorizable through load_lanes and linear targets. */ > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > > + > > > +unsigned test4(char x, char * restrict vect_a, char * restrict vect_b, > > > int n) > > > +{ > > > + unsigned ret = 0; > > > + for (int i = 0; i < n; i+=2) > > > + { > > > + if (vect_a[i] > x || vect_a[i+1] > x) > > > + return 1; > > > + > > > + vect_b[i] = x; > > > + vect_b[i+1] = x+1; > > > + } > > > + return ret; > > > +} > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c > > > new file mode 100644 > > > index > > 0000000000000000000000000000000000000000..dbb14ba3239c91b9bfdf56 > > cecc60750394e10f2b > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c > > > @@ -0,0 +1,25 @@ > > > +/* { dg-add-options vect_early_break } */ > > > +/* { dg-do compile } */ > > > +/* { dg-require-effective-target vect_early_break } */ > > > +/* { dg-require-effective-target vect_int } */ > > > + > > > +/* { dg-additional-options "-Ofast" } */ > > > + > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > > + > > > +char vect_a[1025]; > > > +char vect_b[1025]; > > > + > > > +unsigned test4(char x, int n) > > > +{ > > > + unsigned ret = 0; > > > + for (int i = 1; i < (n - 2); i+=2) > > > + { > > > + if (vect_a[i] > x || vect_a[i+1] > x) > > > + return 1; > > > + > > > + vect_b[i] = x; > > > + vect_b[i+1] = x+1; > > > + } > > > + return ret; > > > +} > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c > > > new file mode 100644 > > > index > > 0000000000000000000000000000000000000000..31e209620925353948325 > > 3efc17499a53d112894 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c > > > @@ -0,0 +1,28 @@ > > > +/* { dg-add-options vect_early_break } */ > > > +/* { dg-do compile } */ > > > +/* { dg-require-effective-target vect_early_break } */ > > > +/* { dg-require-effective-target vect_int } */ > > > + > > > +/* { dg-additional-options "-Ofast" } */ > > > + > > > +/* Group size is uneven, load lanes targets can't safely vectorize this. > > > */ > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using > > > peeling" > > "vect" } } */ > > > + > > > + > > > +char vect_a[1025]; > > > +char vect_b[1025]; > > > + > > > +unsigned test4(char x, int n) > > > +{ > > > + unsigned ret = 0; > > > + for (int i = 1; i < (n - 2); i+=2) > > > + { > > > + if (vect_a[i-1] > x || vect_a[i+2] > x) > > > + return 1; > > > + > > > + vect_b[i] = x; > > > + vect_b[i+1] = x+1; > > > + } > > > + return ret; > > > +} > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c > > > index > > 9d3c6a5dffe3be4a7759b150e330d18144ab5ce5..b3f40b8c9ba49e41bd283e46 > > a462238c3b5825ef 100644 > > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c > > > @@ -23,4 +23,5 @@ unsigned test4(unsigned x, unsigned n) > > > return ret; > > > } > > > > > > -/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } > > > } */ > > > +/* cannot safely vectorize this due due to the group misalignment. */ > > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 > > > "vect" } } > > */ > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c > > > index > > a02d5986ba3cfc117b19305c5e96711299996931..d4fd0d39a25a5659e3d9452 > > b79f3e0fabba8b3c0 100644 > > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c > > > @@ -2,6 +2,7 @@ > > > /* { dg-do compile } */ > > > /* { dg-require-effective-target vect_early_break } */ > > > /* { dg-require-effective-target vect_int } */ > > > +/* { dg-require-effective-target vect_partial_vectors } */ > > > > > > void abort (); > > > int a[64], b[64]; > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c > > > index > > 9096f66647c7b3cb430562d35f8ce076244f7c11..b35e737fa3b9137cd745c14f > > 7ad915a3f81c38c4 100644 > > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c > > > @@ -4,6 +4,7 @@ > > > /* { dg-require-effective-target vect_int } */ > > > /* { dg-add-options bind_pic_locally } */ > > > /* { dg-require-effective-target vect_early_break_hw } */ > > > +/* { dg-require-effective-target vect_partial_vectors } */ > > > > > > #include <stdarg.h> > > > #include "tree-vect.h" > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c > > > index > > 319bd125c3156f13c300ff2b94d269bb9ec29e97..a4886654f152b2c0568286fe > > bea2b31cb7be8499 100644 > > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c > > > @@ -5,8 +5,9 @@ > > > > > > /* { dg-additional-options "-Ofast" } */ > > > > > > -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > > -/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */ > > > +/* Multiple loads of different alignments, we can't peel this. */ > > > +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ > > > +/* { dg-final { scan-tree-dump-not "epilog loop required" "vect" } } */ > > > > > > void abort (); > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c > > > index > > 8a8c076ba92ca6fef419cb23b457a23555c61c64..b58a4611d6b8d86f0247d9ea > > 44ab4750473589a9 100644 > > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c > > > @@ -5,8 +5,9 @@ > > > > > > /* { dg-additional-options "-Ofast" } */ > > > > > > -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > > -/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */ > > > +/* Multiple loads with different misalignments. Can't peel need partial > > > loop > > support. */ > > > +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ > > > +/* { dg-final { scan-tree-dump-not "epilog loop required" "vect" } } */ > > > void abort (); > > > > > > unsigned short sa[32]; > > > diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc > > > index > > 6d5854ac7c7a18e09ec7ad72c534abdc55cb6efa..3df48c8611c8fd843efb145c7c > > 1a0afeee25edfa 100644 > > > --- a/gcc/tree-vect-data-refs.cc > > > +++ b/gcc/tree-vect-data-refs.cc > > > @@ -731,7 +731,9 @@ vect_analyze_early_break_dependences (loop_vec_info > > loop_vinfo) > > > if (is_gimple_debug (stmt)) > > > continue; > > > > > > - stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt); > > > + stmt_vec_info stmt_vinfo > > > + = vect_stmt_to_vectorize (loop_vinfo->lookup_stmt (stmt)); > > > + stmt = STMT_VINFO_STMT (stmt_vinfo); > > > auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo); > > > if (!dr_ref) > > > continue; > > > @@ -748,26 +750,16 @@ vect_analyze_early_break_dependences > > (loop_vec_info loop_vinfo) > > > bounded by VF so accesses are within range. We only need to check > > > the reads since writes are moved to a safe place where if we get > > > there we know they are safe to perform. */ > > > - if (DR_IS_READ (dr_ref) > > > - && !ref_within_array_bound (stmt, DR_REF (dr_ref))) > > > + if (DR_IS_READ (dr_ref)) > > > { > > > - if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo) > > > - || STMT_VINFO_STRIDED_P (stmt_vinfo)) > > > - { > > > - const char *msg > > > - = "early break not supported: cannot peel " > > > - "for alignment, vectorization would read out of " > > > - "bounds at %G"; > > > - return opt_result::failure_at (stmt, msg, stmt); > > > - } > > > - > > > - dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_vinfo); > > > - dr_info->need_peeling_for_alignment = true; > > > + dr_set_safe_speculative_read_required (stmt_vinfo, true); > > > + bool inbounds = ref_within_array_bound (stmt, DR_REF (dr_ref)); > > > + DR_SCALAR_KNOWN_BOUNDS (STMT_VINFO_DR_INFO (stmt_vinfo)) > > = inbounds; > > > > > > if (dump_enabled_p ()) > > > dump_printf_loc (MSG_NOTE, vect_location, > > > - "marking DR (read) as needing peeling for " > > > - "alignment at %G", stmt); > > > + "marking DR (read) as possibly needing peeling > > > " > > > + "for alignment at %G", stmt); > > > } > > > > > > if (DR_IS_READ (dr_ref)) > > > @@ -1326,9 +1318,6 @@ vect_record_base_alignments (vec_info *vinfo) > > > Compute the misalignment of the data reference DR_INFO when > > > vectorizing > > > with VECTYPE. > > > > > > - RESULT is non-NULL iff VINFO is a loop_vec_info. In that case, > > > *RESULT will > > > - be set appropriately on failure (but is otherwise left unchanged). > > > - > > > Output: > > > 1. initialized misalignment info for DR_INFO > > > > > > @@ -1337,7 +1326,7 @@ vect_record_base_alignments (vec_info *vinfo) > > > > > > static void > > > vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info, > > > - tree vectype, opt_result *result = nullptr) > > > + tree vectype) > > > { > > > stmt_vec_info stmt_info = dr_info->stmt; > > > vec_base_alignments *base_alignments = &vinfo->base_alignments; > > > @@ -1365,63 +1354,29 @@ vect_compute_data_ref_alignment (vec_info > > *vinfo, dr_vec_info *dr_info, > > > = exact_div (targetm.vectorize.preferred_vector_alignment (vectype), > > > BITS_PER_UNIT); > > > > > > - /* If this DR needs peeling for alignment for correctness, we must > > > - ensure the target alignment is a constant power-of-two multiple of > > > the > > > - amount read per vector iteration (overriding the above hook where > > > - necessary). */ > > > - if (dr_info->need_peeling_for_alignment) > > > + if (loop_vinfo > > > + && dr_safe_speculative_read_required (stmt_info)) > > > { > > > - /* Vector size in bytes. */ > > > - poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT > > > (vectype)); > > > - > > > - /* We can only peel for loops, of course. */ > > > - gcc_checking_assert (loop_vinfo); > > > - > > > - /* Calculate the number of vectors read per vector iteration. If > > > - it is a power of two, multiply through to get the required > > > - alignment in bytes. Otherwise, fail analysis since alignment > > > - peeling wouldn't work in such a case. */ > > > - poly_uint64 num_scalars = LOOP_VINFO_VECT_FACTOR (loop_vinfo); > > > + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); > > > + auto vectype_size > > > + = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype))); > > > + poly_uint64 new_alignment = vf * vectype_size; > > > + /* If we have a grouped access we require that the alignment be N > > > * elem. */ > > > if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) > > > - num_scalars *= DR_GROUP_SIZE (stmt_info); > > > + new_alignment *= DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT > > (stmt_info)); > > > - auto num_vectors = vect_get_num_vectors (num_scalars, vectype); > > > - if (!pow2p_hwi (num_vectors)) > > > - { > > > - *result = opt_result::failure_at (vect_location, > > > - "non-power-of-two num vectors %u " > > > - "for DR needing peeling for " > > > - "alignment at %G", > > > - num_vectors, stmt_info->stmt); > > > - return; > > > - } > > > - > > > - safe_align *= num_vectors; > > > - if (maybe_gt (safe_align, 4096U)) > > > - { > > > - pretty_printer pp; > > > - pp_wide_integer (&pp, safe_align); > > > - *result = opt_result::failure_at (vect_location, > > > - "alignment required for correctness" > > > - " (%s) may exceed page size", > > > - pp_formatted_text (&pp)); > > > - return; > > > - } > > > - > > > - unsigned HOST_WIDE_INT multiple; > > > - if (!constant_multiple_p (vector_alignment, safe_align, &multiple) > > > - || !pow2p_hwi (multiple)) > > > + unsigned HOST_WIDE_INT target_alignment; > > > + if (new_alignment.is_constant (&target_alignment) > > > + && pow2p_hwi (target_alignment)) > > > { > > > if (dump_enabled_p ()) > > > { > > > dump_printf_loc (MSG_NOTE, vect_location, > > > - "forcing alignment for DR from preferred ("); > > > - dump_dec (MSG_NOTE, vector_alignment); > > > - dump_printf (MSG_NOTE, ") to safe align ("); > > > - dump_dec (MSG_NOTE, safe_align); > > > - dump_printf (MSG_NOTE, ") for stmt: %G", stmt_info->stmt); > > > + "alignment increased due to early break to "); > > > + dump_dec (MSG_NOTE, new_alignment); > > > + dump_printf (MSG_NOTE, " bytes.\n"); > > > } > > > - vector_alignment = safe_align; > > > + vector_alignment = target_alignment; > > > } > > > } > > > > > > @@ -2487,6 +2442,8 @@ vect_enhance_data_refs_alignment (loop_vec_info > > loop_vinfo) > > > || !slpeel_can_duplicate_loop_p (loop, LOOP_VINFO_IV_EXIT > > > (loop_vinfo), > > > loop_preheader_edge (loop)) > > > || loop->inner > > > + /* We don't currently maintaing the LCSSA for prologue peeled > > > inversed > > > + loops. */ > > > || LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo)) > > > do_peeling = false; > > > > > > @@ -2950,12 +2907,9 @@ vect_analyze_data_refs_alignment (loop_vec_info > > loop_vinfo) > > > if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt) > > > && DR_GROUP_FIRST_ELEMENT (dr_info->stmt) != dr_info->stmt) > > > continue; > > > - opt_result res = opt_result::success (); > > > + > > > vect_compute_data_ref_alignment (loop_vinfo, dr_info, > > > - STMT_VINFO_VECTYPE (dr_info->stmt), > > > - &res); > > > - if (!res) > > > - return res; > > > + STMT_VINFO_VECTYPE (dr_info->stmt)); > > > } > > > } > > > > > > @@ -7219,7 +7173,7 @@ vect_supportable_dr_alignment (vec_info *vinfo, > > dr_vec_info *dr_info, > > > > > > if (misalignment == 0) > > > return dr_aligned; > > > - else if (dr_info->need_peeling_for_alignment) > > > + else if (dr_safe_speculative_read_required(stmt_info)) > > > > space after dr_safe_speculative_read_required > > > > Depending on vectorizable_load this might be a bit restrictive (it > > makes peeling do the magic, but we could check > > dr_safe_speculative_read_required there as well). But it's OK for > > now at least. > > > > > return dr_unaligned_unsupported; > > > > > > /* For now assume all conditional loads/stores support unaligned > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > > > index > > 6bbb16beff2c627fca11a7403ba5ee3a5faa21c1..e0e5598babc7b119d408d8351 > > 2ae8c6047d4924c 100644 > > > --- a/gcc/tree-vect-stmts.cc > > > +++ b/gcc/tree-vect-stmts.cc > > > @@ -2597,6 +2597,165 @@ get_load_store_type (vec_info *vinfo, > > stmt_vec_info stmt_info, > > > return false; > > > } > > > > > > + /* If this DR needs alignment for correctness, we must ensure the > > > target > > > + alignment is a constant power-of-two multiple of the amount read per > > > + vector iteration or force masking. */ > > > + if (dr_safe_speculative_read_required (stmt_info)) > > > + { > > > + /* We can only peel for loops, of course. */ > > > + gcc_checking_assert (loop_vinfo); > > > + > > > + /* Check if we support the operation if early breaks are needed. > > > Here we > > > + must ensure that we don't access any more than the scalar code would > > > + have. A masked operation would ensure this, so for these load types > > > + force masking. */ > > > + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo) > > > + && (*memory_access_type == VMAT_GATHER_SCATTER > > > + || *memory_access_type == VMAT_STRIDED_SLP)) > > > + { > > > + if (dump_enabled_p ()) > > > + dump_printf_loc (MSG_NOTE, vect_location, > > > + "early break not supported: cannot peel for " > > > + "alignment. With non-contiguous memory > > vectorization" > > > + " could read out of bounds at %G ", > > > + STMT_VINFO_STMT (stmt_info)); > > > + LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = true; > > > + } > > > + > > > + /* Vector size in bytes. */ > > > + poly_uint64 safe_align; > > > + if (nunits.is_constant ()) > > > + safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT (vectype)); > > > + else > > > + safe_align = estimated_poly_value (LOOP_VINFO_VECT_FACTOR > > (loop_vinfo), > > > + POLY_VALUE_MAX); > > > + > > > + auto num_vectors = ncopies; > > > > You can't rely on ncopies for SLP, so in practice you'll see > > num_vectors == 1 always (unless you force --param vect-force-slp=0). > > I'm not sure why you need this test given you have the target alignment > > check below? > > Yeah for SLP it doesn't matter, but as you said the param still exists and I > don't > think we've introduced any SLP only code yet have we? > > If you prefer I can drop it. What I don't actually remember for the non-SLP > case > is if the VF accounts for the ncopies growth as well? If so I agree this is > completely > superfluous for SLP and non-SLP.
VF and GROUP_SIZE are correct, independent of whether we use SLP or not, it's just that stupid ncopies vs. vec_num stuff that adds to confusion. So yes, please drop it. > > > > > + if (!pow2p_hwi (num_vectors)) > > > + { > > > + if (dump_enabled_p ()) > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > > + "non-power-of-two num vectors %u " > > > + "for DR needing peeling for " > > > + "alignment at %G", > > > + num_vectors, STMT_VINFO_STMT (stmt_info)); > > > + return false; > > > + } > > > + > > > + auto target_alignment > > > + = DR_TARGET_ALIGNMENT (STMT_VINFO_DR_INFO (stmt_info)); > > > + unsigned HOST_WIDE_INT target_align; > > > + bool inbounds > > > + = DR_SCALAR_KNOWN_BOUNDS (STMT_VINFO_DR_INFO (stmt_info)); > > > + > > > + /* If the scalar loop is known to be in bounds, and we're using > > > scalar > > > + accesses then there's no need to check further. */ > > > + if (inbounds > > > + && *memory_access_type == VMAT_ELEMENTWISE) > > > + { > > > + *alignment_support_scheme = dr_aligned; > > > + return true; > > > + } > > > + > > > + bool group_aligned = false; > > > + if (target_alignment.is_constant (&target_align) > > > + && nunits.is_constant ()) > > > + { > > > + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); > > > + auto vectype_size > > > + = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype))); > > > + poly_uint64 required_alignment = vf * vectype_size; > > > + /* If we have a grouped access we require that the alignment be N * > > > elem. > > */ > > > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) > > > + required_alignment *= > > > + DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (stmt_info)); > > > + if (maybe_ne (target_alignment, required_alignment)) > > > > Note ->target_alignment only holds if *alignment_support_scheme == > > dr_aligned but you don't seem to test this? It would be OK to > > have target_alignment be a multiple of required_alignment. > > > > ACK, will change. > > > > + { > > > + if (dump_enabled_p ()) > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > > + "desired alignment %wu not met. Instead got %wu " > > > + "needed for DR alignment at %G", > > > > 'needed' sounds odd, omit it? > > > > > + required_alignment.to_constant (), > > > + target_align, STMT_VINFO_STMT (stmt_info)); > > > + return false; > > > + } > > > + > > > + if (!pow2p_hwi (target_align)) > > > + { > > > + if (dump_enabled_p ()) > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > > + "non-power-of-two vector alignment %wd " > > > + "needed for DR alignment at %G", > > > > likewise. > > > > > + target_align, STMT_VINFO_STMT (stmt_info)); > > > + return false; > > > + } > > > + group_aligned = true; > > > + } > > > + else if (inbounds) > > > > so you have those early outs (return false) above, but here is a fallback? > > Shouldn't this then be not an else if but > > > > if (!group_aligned && inbounds) > > > > ? > > Ack, so you mean that those cases could still be handled by partial vectors. > Hmm that's true. > > > > > > + LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = true; > > > + else > > > > and this else if (!group_aligned)? > > > > > + return false; > > > + > > > + /* Even if uneven group sizes are aligned on the first load, the > > > second > > > + iteration won't be. As such reject non-power of 2 group sizes. */ > > > > isn't this covered above? > > Ack, the group_size * above on the power of two alignment check should cover > it. > > > > > > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) > > > + { > > > + auto group_size = DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT > > (stmt_info)); > > > + if (!pow2p_hwi (group_size)) > > > + { > > > + if (group_aligned || inbounds) > > > + LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = > > true; > > > + else > > > + { > > > + if (dump_enabled_p ()) > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > > + "early break not supported: uneven group size, " > > > + "vectorization could read out of bounds at %G ", > > > + STMT_VINFO_STMT (stmt_info)); > > > + return false; > > > + } > > > + } > > > + } > > > + > > > + safe_align *= num_vectors; > > > + /* For VLA we have to insert a runtime check that the vector loads > > > + per iterations don't exceed a page size. For now we can use > > > + POLY_VALUE_MAX as a proxy as we can't peel for VLA. */ > > > > And this should be a check on required_alignment computed above, > > resulting in group_aligned = false but still handling it in the > > inbounds case? > > > > ACK, since if larger than a pagesize it can be handled with partial > loops for the inbound case. and I guess out of bound case should be > handled through partial vectors. > > > > + if (maybe_gt (safe_align, (unsigned)param_min_pagesize)) > > > + { > > > + if (dump_enabled_p ()) > > > + { > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > > + "alignment required for correctness ("); > > > + dump_dec (MSG_MISSED_OPTIMIZATION, safe_align); > > > + dump_printf (MSG_NOTE, ") may exceed page size\n"); > > > + } > > > + return false; > > > + } > > > + > > > + /* There are multiple loads that have a misalignment that we > > > couldn't > > > + align. We would need LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P to > > > + vectorize. */ > > > + if (!STMT_VINFO_GROUPED_ACCESS (stmt_info) > > > + && *misalignment != 0) > > > > that's probably the missing dr_aligned check? > > > > So, what I'm confused about is the following: we set ->target_alignment > > (if possible) to sth that peeling can ensure the accesses will be > > aligned - this then results in dr_aligned. Here we verify if the > > access is aligned and aligned enough. > > > > Now as said above, that we end up with dr_unaligned_unsupported is > > off, so we probably shouldn't use that to indicate the higher > > desired alignment to the peeling code. > > > > This is because of cases like in > /work/gcc/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c > > Where > > /* check results: */ > for (i = 0; i < n; i++) > { > if (sa[i+7] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i]) > abort (); > } > > Where there is no group loads, but sa and ia have different misalignments and > peeling can't do anything here as there's no way to peel them to a mutual > alignment > (it's even not something we even try). > > So this is to reject these cases. If we get to vectorizable_load with any > misalignment > still set, it means we had multiple known misalignments which we can't > vectorize. > > So explicitly, these cases *won't* be dr_aligned but will be marked as > requiring > alignment. Yes, but this should have been covered above already as well if you'd actually checked for dr_aligned instead of just matching up target_alignment (that wouldn't hold in this case) and required_alignment. > > > + return false; > > > + > > > + /* When using a group access the first element may be aligned but > > > the > > > + subsequent loads may not be. For LOAD_LANES since the loads are based > > > + on the first DR then all loads in the group are aligned. For > > > + non-LOAD_LANES this is not the case. In particular a load + blend when > > > + there are gaps can have the non first loads issued unaligned, even > > > + partially overlapping the memory of the first load in order to simplify > > > + the blend. This is what the x86_64 backend does for instance. As > > > + such only the first load in the group is aligned, the rest are not. */ > > > + else if (STMT_VINFO_GROUPED_ACCESS (stmt_info) > > > + && *misalignment != 0 > > > + && *memory_access_type != VMAT_LOAD_STORE_LANES) > > > + *alignment_support_scheme = dr_unaligned_supported; > > > > this looks definitely bogus, some targets cannot do unaligned accesses. > > > > This is because of cases like > /work/gcc/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c: > > for (int i = 2; i < 64; i+=2) > if (b[i] != a[i] - a[i-2] > || b[i+1] != a[i+1] - a[i-1]) > abort (); > > Where we have a group load with a with a gap. On x86_64 this results in two > loads > + a blend. But the second load isn't aligned. This should be either covered by the above checks already or if only one load arrives here it's the fault of the downstream code splitting the aligned load into something that isn't valid to speculate. > I guess as we put all the alignment requirements on the group leader, that > perhaps > this code in get_load_store_type should be gated on > > STMT_VINFO_GROUPED_ACCESS (stmt_info) > && DR_GROUP_FIRST_ELEMENT (stmt_info) == stmt_info > > ? No, I don't think so. The code that eventually performs a contiguous sub-group access directly should never extend the load beyond GROUP_SIZE - or should be gated on the DR not executed speculatively. That is, we should "fix" this elsewhere. If you have an updated patch I can look at what's wrong here if you tell me how to reproduce (after applying the patch I suppose). > Enforcing the alignment on every group member would be wrong I think since > that ends up higher overall alignment than they need. > > > So besides these issues in get_load_store_type the change looks good now. > > > > Thanks for the reviews. > > Tamar > > Richard. > > > > > + else > > > + *alignment_support_scheme = dr_aligned; > > > + } > > > + > > > if (*alignment_support_scheme == dr_unaligned_unsupported) > > > { > > > if (dump_enabled_p ()) > > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h > > > index > > b0cb081cba0ae8b11fbfcfcb8c6d440ec451ccb5..97caf61b345735d297ec49fd6ca > > 64797435b46fc 100644 > > > --- a/gcc/tree-vectorizer.h > > > +++ b/gcc/tree-vectorizer.h > > > @@ -1281,7 +1281,11 @@ public: > > > > > > /* Set by early break vectorization when this DR needs peeling for > > > alignment > > > for correctness. */ > > > - bool need_peeling_for_alignment; > > > + bool safe_speculative_read_required; > > > + > > > + /* Set by early break vectorization when this DR's scalar accesses are > > > known > > > + to be inbounds of a known bounds loop. */ > > > + bool scalar_access_known_in_bounds; > > > > > > tree base_decl; > > > > > > @@ -1997,6 +2001,35 @@ dr_target_alignment (dr_vec_info *dr_info) > > > return dr_info->target_alignment; > > > } > > > #define DR_TARGET_ALIGNMENT(DR) dr_target_alignment (DR) > > > +#define DR_SCALAR_KNOWN_BOUNDS(DR) (DR)- > > >scalar_access_known_in_bounds > > > + > > > +/* Return if the stmt_vec_info requires peeling for alignment. */ > > > +inline bool > > > +dr_safe_speculative_read_required (stmt_vec_info stmt_info) > > > +{ > > > + dr_vec_info *dr_info; > > > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) > > > + dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (stmt_info)); > > > + else > > > + dr_info = STMT_VINFO_DR_INFO (stmt_info); > > > + > > > + return dr_info->safe_speculative_read_required; > > > +} > > > + > > > +/* Set the safe_speculative_read_required for the the stmt_vec_info, if > > > group > > > + access then set on the fist element otherwise set on DR directly. */ > > > +inline void > > > +dr_set_safe_speculative_read_required (stmt_vec_info stmt_info, > > > + bool requires_alignment) > > > +{ > > > + dr_vec_info *dr_info; > > > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) > > > + dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (stmt_info)); > > > + else > > > + dr_info = STMT_VINFO_DR_INFO (stmt_info); > > > + > > > + dr_info->safe_speculative_read_required = requires_alignment; > > > +} > > > > > > inline void > > > set_dr_target_alignment (dr_vec_info *dr_info, poly_uint64 val) > > > > > > > > > > > > > -- > > Richard Biener <rguent...@suse.de> > > SUSE Software Solutions Germany GmbH, > > Frankenstrasse 146, 90461 Nuernberg, Germany; > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg) > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)