Richard Biener <rguent...@suse.de> writes: > On Tue, 4 Mar 2025, Tamar Christina wrote: > >> Hi All, >> >> This fixes two PRs on Early break vectorization by delaying the safety >> checks to >> vectorizable_load when the VF, VMAT and vectype are all known. >> >> This patch does add two new restrictions: >> >> 1. On LOAD_LANES targets, where the buffer size is known, we reject uneven >> group sizes, as they are unaligned every n % 2 iterations and so may cross >> a page unwittingly.
Sorry for the drive-by comment, but: it might be worth updating the commit message to say non-power-of-2, rather than uneven. The patch uses the right check, but the message made it sound like it didn't. Thanks, Richard >> 2. On LOAD_LANES targets when the buffer is unknown, we reject vectorization >> if >> we cannot peel for alignment, as the alignment requirement is quite large >> at >> GROUP_SIZE * vectype_size. This is unlikely to ever be beneficial so we >> don't support it for now. >> >> There are other steps documented inside the code itself so that the reasoning >> is next to the code. >> >> As a fall-back, when the alignment fails we require partial vector support. >> >> For VLA targets like SVE return element alignment as the desired vector >> alignment. This means that the loads are never misaligned and so annoying it >> won't ever need to peel. >> >> So what I think needs to happen in GCC 16 is that. >> >> 1. during vect_compute_data_ref_alignment we need to take the max of >> POLY_VALUE_MIN and vector_alignment. >> >> 2. vect_do_peeling define skip_vector when PFA for VLA, and in the guard add >> a >> check that ncopies * vectype does not exceed POLY_VALUE_MAX which we use >> as a >> proxy for pagesize. >> >> 3. Force LOOP_VINFO_USING_PARTIAL_VECTORS_P to be true in >> vect_determine_partial_vectors_and_peeling since the first iteration has >> to >> be partial. If LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P we have to fail to >> vectorize. >> >> 4. Create a default mask to be used, so that >> vect_use_loop_mask_for_alignment_p >> becomes true and we generate the peeled check through loop control for >> partial loops. From what I can tell this won't work for >> LOOP_VINFO_FULLY_WITH_LENGTH_P since they don't have any peeling support >> at >> all in the compiler. That would need to be done independently from the >> above. >> >> In any case, not GCC 15 material so I've kept the WIP patches I have >> downstream. >> >> Bootstrapped Regtested on aarch64-none-linux-gnu, >> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu >> -m32, -m64 and no issues. >> >> Ok for master? > > OK. > > Thanks, > Richard. > >> Thanks, >> Tamar >> >> gcc/ChangeLog: >> >> PR tree-optimization/118464 >> PR tree-optimization/116855 >> * doc/invoke.texi (min-pagesize): Update docs with vectorizer use. >> * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay >> checks. >> (vect_compute_data_ref_alignment): Remove alignment checks and move to >> get_load_store_type, increase group access alignment. >> (vect_enhance_data_refs_alignment): Add note to comment needing >> investigating. >> (vect_analyze_data_refs_alignment): Likewise. >> (vect_supportable_dr_alignment): For group loads look at first DR. >> * tree-vect-stmts.cc (get_load_store_type): >> Perform safety checks for early break pfa. >> * tree-vectorizer.h (dr_set_safe_speculative_read_required, >> dr_safe_speculative_read_required, DR_SCALAR_KNOWN_BOUNDS): New. >> (need_peeling_for_alignment): Renamed to... >> (safe_speculative_read_required): .. This >> (class dr_vec_info): Add scalar_access_known_in_bounds. >> >> gcc/testsuite/ChangeLog: >> >> PR tree-optimization/118464 >> PR tree-optimization/116855 >> * gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the >> load type is relaxed later. >> * gcc.dg/vect/vect-early-break_121-pr114081.c: Update. >> * gcc.dg/vect/vect-early-break_22.c: Require partial vectors. >> * gcc.dg/vect/vect-early-break_128.c: Likewise. >> * gcc.dg/vect/vect-early-break_26.c: Likewise. >> * gcc.dg/vect/vect-early-break_43.c: Likewise. >> * gcc.dg/vect/vect-early-break_44.c: Likewise. >> * gcc.dg/vect/vect-early-break_2.c: Require load_lanes. >> * gcc.dg/vect/vect-early-break_7.c: Likewise. >> * g++.dg/vect/vect-early-break_7-pr118464.cc: New test. >> * gcc.dg/vect/vect-early-break_132-pr118464.c: New test. >> * gcc.dg/vect/vect-early-break_133_pfa1.c: New test. >> * gcc.dg/vect/vect-early-break_133_pfa11.c: New test. >> * gcc.dg/vect/vect-early-break_133_pfa10.c: New test. >> * gcc.dg/vect/vect-early-break_133_pfa2.c: New test. >> * gcc.dg/vect/vect-early-break_133_pfa3.c: New test. >> * gcc.dg/vect/vect-early-break_133_pfa4.c: New test. >> * gcc.dg/vect/vect-early-break_133_pfa5.c: New test. >> * gcc.dg/vect/vect-early-break_133_pfa6.c: New test. >> * gcc.dg/vect/vect-early-break_133_pfa7.c: New test. >> * gcc.dg/vect/vect-early-break_133_pfa8.c: New test. >> * gcc.dg/vect/vect-early-break_133_pfa9.c: New test. >> * gcc.dg/vect/vect-early-break_39.c: Update testcase for misalignment. >> * gcc.dg/vect/vect-early-break_53.c: Likewise. >> * gcc.dg/vect/vect-early-break_56.c: Likewise. >> * gcc.dg/vect/vect-early-break_57.c: Likewise. >> * gcc.dg/vect/vect-early-break_81.c: Likewise. >> >> --- >> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi >> index >> 6f8bf3923863dee9ed35b0497f1ef58a65726701..a4c62e50785362c93de31ac44f4fb5cbf4d1e1ee >> 100644 >> --- a/gcc/doc/invoke.texi >> +++ b/gcc/doc/invoke.texi >> @@ -17260,7 +17260,7 @@ Maximum number of relations the oracle will register >> in a basic block. >> Work bound when discovering transitive relations from existing relations. >> >> @item min-pagesize >> -Minimum page size for warning purposes. >> +Minimum page size for warning and early break vectorization purposes. >> >> @item openacc-kernels >> Specify mode of OpenACC `kernels' constructs handling. >> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c >> b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c >> index >> 9ef1330b47c817e16baaafa44c2b15108b9dd3a9..4c8255895b976653228233d93c950629f3231554 >> 100644 >> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c >> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c >> @@ -55,7 +55,9 @@ int main() >> } >> } >> rephase (); >> +#pragma GCC novector >> for (i = 0; i < 32; ++i) >> +#pragma GCC novector >> for (j = 0; j < 3; ++j) >> #pragma GCC novector >> for (k = 0; k < 3; ++k) >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c >> b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c >> index >> 423ff0b566b18bf04ce4f67a45b94dc1a021a4a0..f99c57be0adc4d49035b8a75c72d4a5b04cc05c7 >> 100644 >> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c >> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c >> @@ -5,7 +5,8 @@ >> /* { dg-additional-options "-O3" } */ >> /* { dg-additional-options "-mavx2" { target { x86_64-*-* i?86-*-* } } } */ >> >> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ >> +/* Arm and -m32 create a group size of 3 here, which we can't support yet. >> AArch64 makes elementwise accesses here. */ >> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { >> aarch64*-*-* } } } } */ >> >> typedef struct filter_list_entry { >> const char *name; >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_128.c >> b/gcc/testsuite/gcc.dg/vect/vect-early-break_128.c >> index >> 6d7fb920ec2de529a4aa1de2c4a04286989204fd..ed6baf2d451f3887076a1e9143035363128efe70 >> 100644 >> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_128.c >> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_128.c >> @@ -3,7 +3,8 @@ >> /* { dg-require-effective-target vect_early_break } */ >> /* { dg-require-effective-target vect_int } */ >> >> -/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */ >> +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { >> target vect_partial_vectors } } } */ >> +/* { dg-final { scan-tree-dump-not "vectorizing stmts using SLP" "vect" { >> target { ! vect_partial_vectors } } } } */ >> /* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" } } */ >> >> #ifndef N >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c >> b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c >> new file mode 100644 >> index >> 0000000000000000000000000000000000000000..9bf0cbc8853f74de550e8ac83ab569fc9fbde126 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c >> @@ -0,0 +1,25 @@ >> +/* { dg-do compile } */ >> +/* { dg-add-options vect_early_break } */ >> +/* { dg-require-effective-target vect_early_break } */ >> +/* { dg-require-effective-target vect_int } */ >> +/* { dg-additional-options "-O3" } */ >> + >> +int a, b, c, d, e, f; >> +short g[1]; >> +int main() { >> + int h; >> + while (a) { >> + while (h) >> + ; >> + for (b = 2; b; b--) { >> + while (c) >> + ; >> + f = g[a]; >> + if (d) >> + break; >> + } >> + while (e) >> + ; >> + } >> + return 0; >> +} >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c >> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c >> new file mode 100644 >> index >> 0000000000000000000000000000000000000000..dc771186efafe25bb65490da7a383ad7f6ceb0a7 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c >> @@ -0,0 +1,19 @@ >> +/* { dg-do compile } */ >> +/* { dg-add-options vect_early_break } */ >> +/* { dg-require-effective-target vect_early_break } */ >> +/* { dg-require-effective-target vect_int } */ >> +/* { dg-additional-options "-O3" } */ >> + >> +char string[1020]; >> + >> +char * find(int n, char c) >> +{ >> + for (int i = 1; i < n; i++) { >> + if (string[i] == c) >> + return &string[i]; >> + } >> + return 0; >> +} >> + >> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ >> +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" >> "vect" } } */ >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c >> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c >> new file mode 100644 >> index >> 0000000000000000000000000000000000000000..dd05046982524f15662be8df517716b581b8a2d9 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c >> @@ -0,0 +1,25 @@ >> +/* { dg-add-options vect_early_break } */ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target vect_early_break } */ >> +/* { dg-require-effective-target vect_int } */ >> + >> +/* { dg-additional-options "-Ofast" } */ >> + >> +/* Alignment requirement too big, load lanes targets can't safely vectorize >> this. */ >> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { >> vect_partial_vectors || vect_load_lanes } } } } */ >> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! { >> vect_partial_vectors || vect_load_lanes } } } } } */ >> +/* { dg-final { scan-tree-dump-not "Alignment of access forced using >> peeling" "vect" { target { ! { vect_partial_vectors || vect_load_lanes } } } >> } } */ >> + >> +unsigned test4(char x, char *restrict vect_a, char *restrict vect_b, int n) >> +{ >> + unsigned ret = 0; >> + for (int i = 0; i < (n - 2); i+=2) >> + { >> + if (vect_a[i] > x || vect_a[i+2] > x) >> + return 1; >> + >> + vect_b[i] = x; >> + vect_b[i+1] = x+1; >> + } >> + return ret; >> +} >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa11.c >> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa11.c >> new file mode 100644 >> index >> 0000000000000000000000000000000000000000..085dd9b81bb6943440f34d044cbd24ee2121657c >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa11.c >> @@ -0,0 +1,26 @@ >> +/* { dg-add-options vect_early_break } */ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target vect_early_break } */ >> +/* { dg-require-effective-target vect_int } */ >> + >> +/* Gathers and scatters are not save to speculate across early breaks. */ >> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! >> vect_partial_vectors } } } } */ >> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target >> vect_partial_vectors } } } */ >> + >> +#define N 1024 >> +int vect_a[N]; >> +int vect_b[N]; >> + >> +int test4(int x, int stride) >> +{ >> + int ret = 0; >> + for (int i = 0; i < (N / stride); i++) >> + { >> + vect_b[i] += x + i; >> + if (vect_a[i*stride] == x) >> + return i; >> + vect_a[i] += x * vect_b[i]; >> + >> + } >> + return ret; >> +} >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c >> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c >> new file mode 100644 >> index >> 0000000000000000000000000000000000000000..7d56772fbf380ce42ac758ca29a5f3f9d3f6e0d1 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c >> @@ -0,0 +1,19 @@ >> +/* { dg-do compile } */ >> +/* { dg-add-options vect_early_break } */ >> +/* { dg-require-effective-target vect_early_break } */ >> +/* { dg-require-effective-target vect_int } */ >> +/* { dg-additional-options "-O3" } */ >> + >> +char string[1020]; >> + >> +char * find(int n, char c) >> +{ >> + for (int i = 0; i < n; i++) { >> + if (string[i] == c) >> + return &string[i]; >> + } >> + return 0; >> +} >> + >> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ >> +/* { dg-final { scan-tree-dump-not "Alignment of access forced using >> peeling" "vect" } } */ >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c >> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c >> new file mode 100644 >> index >> 0000000000000000000000000000000000000000..374a051b945e97eedb9be9da423cf54b5e564d6f >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c >> @@ -0,0 +1,20 @@ >> +/* { dg-do compile } */ >> +/* { dg-add-options vect_early_break } */ >> +/* { dg-require-effective-target vect_early_break } */ >> +/* { dg-require-effective-target vect_int } */ >> +/* { dg-additional-options "-O3" } */ >> + >> +char string[1020] __attribute__((aligned(1))); >> + >> +char * find(int n, char c) >> +{ >> + for (int i = 1; i < n; i++) { >> + if (string[i] == c) >> + return &string[i]; >> + } >> + return 0; >> +} >> + >> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ >> +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" >> "vect" } } */ >> +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */ >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c >> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c >> new file mode 100644 >> index >> 0000000000000000000000000000000000000000..297fb7e9b9beffa25ab8f257ceea1c065fcc6ae9 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c >> @@ -0,0 +1,20 @@ >> +/* { dg-do compile } */ >> +/* { dg-add-options vect_early_break } */ >> +/* { dg-require-effective-target vect_early_break } */ >> +/* { dg-require-effective-target vect_int } */ >> +/* { dg-additional-options "-O3" } */ >> + >> +char string[1020] __attribute__((aligned(1))); >> + >> +char * find(int n, char c) >> +{ >> + for (int i = 0; i < n; i++) { >> + if (string[i] == c) >> + return &string[i]; >> + } >> + return 0; >> +} >> + >> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ >> +/* { dg-final { scan-tree-dump-not "Alignment of access forced using >> peeling" "vect" } } */ >> +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */ >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c >> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c >> new file mode 100644 >> index >> 0000000000000000000000000000000000000000..ca95be44e92e32769da1d1e9b740ae54682a3d55 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c >> @@ -0,0 +1,23 @@ >> +/* { dg-add-options vect_early_break } */ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target vect_early_break } */ >> +/* { dg-require-effective-target vect_int } */ >> + >> +/* { dg-additional-options "-Ofast" } */ >> + >> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ >> + >> +unsigned test4(char x, char *vect, int n) >> +{ >> + unsigned ret = 0; >> + for (int i = 0; i < n; i++) >> + { >> + if (vect[i] > x) >> + return 1; >> + >> + vect[i] = x; >> + } >> + return ret; >> +} >> + >> +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" >> "vect" } } */ >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c >> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c >> new file mode 100644 >> index >> 0000000000000000000000000000000000000000..ee123df6ed2ba97e92307c64a61c97b1b6268743 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c >> @@ -0,0 +1,23 @@ >> +/* { dg-add-options vect_early_break } */ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target vect_early_break } */ >> +/* { dg-require-effective-target vect_int } */ >> + >> +/* { dg-additional-options "-Ofast" } */ >> + >> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ >> + >> +unsigned test4(char x, char *vect_a, char *vect_b, int n) >> +{ >> + unsigned ret = 0; >> + for (int i = 1; i < n; i++) >> + { >> + if (vect_a[i] > x || vect_b[i] > x) >> + return 1; >> + >> + vect_a[i] = x; >> + } >> + return ret; >> +} >> + >> +/* { dg-final { scan-tree-dump "Versioning for alignment will be applied" >> "vect" } } */ >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c >> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c >> new file mode 100644 >> index >> 0000000000000000000000000000000000000000..cf76c7109edce15f860cdc27e10850ef5a31fc9a >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c >> @@ -0,0 +1,23 @@ >> +/* { dg-add-options vect_early_break } */ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target vect_early_break } */ >> +/* { dg-require-effective-target vect_int } */ >> + >> +/* { dg-additional-options "-Ofast" } */ >> + >> +/* This should be vectorizable through load_lanes and linear targets. */ >> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target >> vect_load_lanes } } } */ >> + >> +unsigned test4(char x, char * restrict vect_a, char * restrict vect_b, int >> n) >> +{ >> + unsigned ret = 0; >> + for (int i = 0; i < n; i+=2) >> + { >> + if (vect_a[i] > x || vect_a[i+1] > x) >> + return 1; >> + >> + vect_b[i] = x; >> + vect_b[i+1] = x+1; >> + } >> + return ret; >> +} >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c >> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c >> new file mode 100644 >> index >> 0000000000000000000000000000000000000000..25d3a62356baf127c89187b150810e4d31567c6f >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c >> @@ -0,0 +1,26 @@ >> +/* { dg-add-options vect_early_break } */ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target vect_early_break } */ >> +/* { dg-require-effective-target vect_int } */ >> + >> +/* { dg-additional-options "-Ofast" } */ >> + >> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target >> vect_partial_vectors } } } */ >> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! >> vect_partial_vectors } } } } */ >> + >> +char vect_a[1025]; >> +char vect_b[1025]; >> + >> +unsigned test4(char x, int n) >> +{ >> + unsigned ret = 0; >> + for (int i = 1; i < (n - 2); i+=2) >> + { >> + if (vect_a[i] > x || vect_a[i+1] > x) >> + return 1; >> + >> + vect_b[i] = x; >> + vect_b[i+1] = x+1; >> + } >> + return ret; >> +} >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c >> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c >> new file mode 100644 >> index >> 0000000000000000000000000000000000000000..10eb98b726acb32a0d1de4daf202724995bfa1a6 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c >> @@ -0,0 +1,29 @@ >> +/* { dg-add-options vect_early_break } */ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target vect_early_break } */ >> +/* { dg-require-effective-target vect_int } */ >> + >> +/* { dg-additional-options "-Ofast" } */ >> + >> +/* Group size is uneven and second group is misaligned. Needs partial >> vectors. */ >> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target >> vect_partial_vectors } } } */ >> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! >> vect_partial_vectors } } } } */ >> +/* { dg-final { scan-tree-dump-not "Alignment of access forced using >> peeling" "vect" } } */ >> + >> + >> +char vect_a[1025]; >> +char vect_b[1025]; >> + >> +unsigned test4(char x, int n) >> +{ >> + unsigned ret = 0; >> + for (int i = 1; i < (n - 2); i+=2) >> + { >> + if (vect_a[i-1] > x || vect_a[i+2] > x) >> + return 1; >> + >> + vect_b[i] = x; >> + vect_b[i+1] = x+1; >> + } >> + return ret; >> +} >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c >> b/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c >> index >> babc79c74c39b5beedd293f2138f0c46846543b0..edddb44bad66aa419d097f69ca850e5eaa66e014 >> 100644 >> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c >> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c >> @@ -5,7 +5,8 @@ >> >> /* { dg-additional-options "-Ofast" } */ >> >> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ >> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target >> vect_load_lanes } } } */ >> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! >> vect_load_lanes } } } } */ >> >> #ifndef N >> #define N 803 >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c >> b/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c >> index >> dec0b492ab883de6e02944a95fd554a109a68a39..8f5ccc45ce06ed36627107e080d633e55e254fa0 >> 100644 >> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c >> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c >> @@ -5,7 +5,9 @@ >> >> /* { dg-additional-options "-Ofast" } */ >> >> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! >> "arm*-*-*" } } } } */ >> +/* Complex numbers read x and x+1, which on non-load lanes targets require >> partial loops. */ >> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { { ! >> "arm*-*-*" } && vect_load_lanes } } } } */ >> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { { >> "arm*-*-*" } || { ! vect_load_lanes } } } } } */ >> >> #include <complex.h> >> >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c >> b/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c >> index >> 039aac7fd84cf6131e1ea401b87385a32b545e67..7ac1e76f0aca37aa04a767b6034000f09aaf98b8 >> 100644 >> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c >> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c >> @@ -5,7 +5,7 @@ >> >> /* { dg-additional-options "-Ofast" } */ >> >> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ >> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! ia32 } >> } } } */ >> >> #include <stdbool.h> >> >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c >> b/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c >> index >> f73f3c2eb86e804803a969dab983dc9e39eed66a..483ea5f243c825d6a6c4f5aa7f86c3f9eb8b2e10 >> 100644 >> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c >> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c >> @@ -5,7 +5,7 @@ >> >> /* { dg-additional-options "-Ofast" } */ >> >> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ >> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! ia32 } >> } } } */ >> >> #include <stdbool.h> >> >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c >> b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c >> index >> b3f5984f682f30f79331d48a264c2cc4af3e2503..f8f84fab97ab586847000af8b89448b0885ef5fc >> 100644 >> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c >> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c >> @@ -42,4 +42,6 @@ main () >> return 0; >> } >> >> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 >> "vect" } } */ >> +/* This will fail because we cannot SLP the load groups yet. */ >> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 >> "vect" { target vect_partial_vectors } } } */ >> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 >> "vect" { target { ! vect_partial_vectors } } } } */ >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c >> b/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c >> index >> 47d2a50218bd1b32fe43edcaaabb1079d0b26223..643016b2ccfea29ba36d65c8070f255cb8179481 >> 100644 >> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c >> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c >> @@ -41,4 +41,6 @@ main () >> return 0; >> } >> >> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 >> "vect" } } */ >> +/* This will fail because we cannot SLP the load groups yet. */ >> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 >> "vect" { target vect_partial_vectors } } } */ >> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 >> "vect" { target { ! vect_partial_vectors } } } } */ >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c >> b/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c >> index >> 8062fbbf6422af6a2e42de9574e88d411a8fb917..36fc6a6eb60fae70f8f05a3d9435f5adce025847 >> 100644 >> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c >> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c >> @@ -23,4 +23,5 @@ unsigned test4(unsigned x) >> return ret; >> } >> >> -/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } >> */ >> \ No newline at end of file >> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { >> target vect_load_lanes } } } */ >> +/* { dg-final { scan-tree-dump-not "vectorized 1 loops in function" "vect" >> { target { ! vect_load_lanes } } } } */ >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c >> b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c >> index >> 9d3c6a5dffe3be4a7759b150e330d18144ab5ce5..b3f40b8c9ba49e41bd283e46a462238c3b5825ef >> 100644 >> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c >> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c >> @@ -23,4 +23,5 @@ unsigned test4(unsigned x, unsigned n) >> return ret; >> } >> >> -/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } >> */ >> +/* cannot safely vectorize this due due to the group misalignment. */ >> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 >> "vect" } } */ >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c >> b/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c >> index >> 7e9f635a0b5a8f6fb5da5d7cc6a426f343af4b56..0cfa2428cc61d5f4ea0784367acea6436736970f >> 100644 >> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c >> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c >> @@ -27,4 +27,6 @@ unsigned test4(unsigned x) >> return ret; >> } >> >> -/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } >> */ >> \ No newline at end of file >> +/* This will fail because we cannot SLP the load groups yet. */ >> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { >> target vect_partial_vectors } } } */ >> +/* { dg-final { scan-tree-dump-not "vectorized 1 loops in function" "vect" >> { target { ! vect_partial_vectors } } } } */ >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c >> b/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c >> index >> 7e9f635a0b5a8f6fb5da5d7cc6a426f343af4b56..0cfa2428cc61d5f4ea0784367acea6436736970f >> 100644 >> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c >> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c >> @@ -27,4 +27,6 @@ unsigned test4(unsigned x) >> return ret; >> } >> >> -/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } >> */ >> \ No newline at end of file >> +/* This will fail because we cannot SLP the load groups yet. */ >> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { >> target vect_partial_vectors } } } */ >> +/* { dg-final { scan-tree-dump-not "vectorized 1 loops in function" "vect" >> { target { ! vect_partial_vectors } } } } */ >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c >> b/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c >> index >> a02d5986ba3cfc117b19305c5e96711299996931..d4fd0d39a25a5659e3d9452b79f3e0fabba8b3c0 >> 100644 >> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c >> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c >> @@ -2,6 +2,7 @@ >> /* { dg-do compile } */ >> /* { dg-require-effective-target vect_early_break } */ >> /* { dg-require-effective-target vect_int } */ >> +/* { dg-require-effective-target vect_partial_vectors } */ >> >> void abort (); >> int a[64], b[64]; >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c >> b/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c >> index >> 9096f66647c7b3cb430562d35f8ce076244f7c11..b35e737fa3b9137cd745c14f7ad915a3f81c38c4 >> 100644 >> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c >> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c >> @@ -4,6 +4,7 @@ >> /* { dg-require-effective-target vect_int } */ >> /* { dg-add-options bind_pic_locally } */ >> /* { dg-require-effective-target vect_early_break_hw } */ >> +/* { dg-require-effective-target vect_partial_vectors } */ >> >> #include <stdarg.h> >> #include "tree-vect.h" >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c >> b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c >> index >> 319bd125c3156f13c300ff2b94d269bb9ec29e97..a4886654f152b2c0568286febea2b31cb7be8499 >> 100644 >> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c >> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c >> @@ -5,8 +5,9 @@ >> >> /* { dg-additional-options "-Ofast" } */ >> >> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ >> -/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */ >> +/* Multiple loads of different alignments, we can't peel this. */ >> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ >> +/* { dg-final { scan-tree-dump-not "epilog loop required" "vect" } } */ >> >> void abort (); >> >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c >> b/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c >> index >> 7b870e9c60dcac6164d879dd70c1fc07ec0221fe..c7cce81f52c80d83bd2c1face8cbd13f93834531 >> 100644 >> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c >> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c >> @@ -5,7 +5,9 @@ >> >> /* { dg-additional-options "-Ofast" } */ >> >> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ >> +/* This will fail because we cannot SLP the load groups yet. */ >> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target >> vect_partial_vectors } } } */ >> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! >> vect_partial_vectors } } } } */ >> >> #define N 1024 >> unsigned vect_a[N]; >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c >> b/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c >> index >> d218a0686719fee4c167684dcf26402851b53260..34d187483320b9cc215304b73e28d45d7031516e >> 100644 >> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c >> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c >> @@ -5,7 +5,10 @@ >> >> /* { dg-additional-options "-Ofast" } */ >> >> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! >> "arm*-*-*" } } } } */ >> +/* Complex numbers read x and x+1, which on non-load lanes targets require >> partial loops. */ >> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { { ! >> "arm*-*-*" } && vect_load_lanes } } } } */ >> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { { >> "arm*-*-*" } || { ! vect_load_lanes } } } } } */ >> + >> >> #include <complex.h> >> >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c >> b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c >> index >> 8a8c076ba92ca6fef419cb23b457a23555c61c64..b58a4611d6b8d86f0247d9ea44ab4750473589a9 >> 100644 >> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c >> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c >> @@ -5,8 +5,9 @@ >> >> /* { dg-additional-options "-Ofast" } */ >> >> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ >> -/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */ >> +/* Multiple loads with different misalignments. Can't peel need partial >> loop support. */ >> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ >> +/* { dg-final { scan-tree-dump-not "epilog loop required" "vect" } } */ >> void abort (); >> >> unsigned short sa[32]; >> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc >> index >> 6d5854ac7c7a18e09ec7ad72c534abdc55cb6efa..9949fc3d98852399242a96095f4dae5ffe7613b3 >> 100644 >> --- a/gcc/tree-vect-data-refs.cc >> +++ b/gcc/tree-vect-data-refs.cc >> @@ -731,7 +731,9 @@ vect_analyze_early_break_dependences (loop_vec_info >> loop_vinfo) >> if (is_gimple_debug (stmt)) >> continue; >> >> - stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt); >> + stmt_vec_info stmt_vinfo >> + = vect_stmt_to_vectorize (loop_vinfo->lookup_stmt (stmt)); >> + stmt = STMT_VINFO_STMT (stmt_vinfo); >> auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo); >> if (!dr_ref) >> continue; >> @@ -748,26 +750,16 @@ vect_analyze_early_break_dependences (loop_vec_info >> loop_vinfo) >> bounded by VF so accesses are within range. We only need to check >> the reads since writes are moved to a safe place where if we get >> there we know they are safe to perform. */ >> - if (DR_IS_READ (dr_ref) >> - && !ref_within_array_bound (stmt, DR_REF (dr_ref))) >> + if (DR_IS_READ (dr_ref)) >> { >> - if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo) >> - || STMT_VINFO_STRIDED_P (stmt_vinfo)) >> - { >> - const char *msg >> - = "early break not supported: cannot peel " >> - "for alignment, vectorization would read out of " >> - "bounds at %G"; >> - return opt_result::failure_at (stmt, msg, stmt); >> - } >> - >> - dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_vinfo); >> - dr_info->need_peeling_for_alignment = true; >> + dr_set_safe_speculative_read_required (stmt_vinfo, true); >> + bool inbounds = ref_within_array_bound (stmt, DR_REF (dr_ref)); >> + DR_SCALAR_KNOWN_BOUNDS (STMT_VINFO_DR_INFO (stmt_vinfo)) = >> inbounds; >> >> if (dump_enabled_p ()) >> dump_printf_loc (MSG_NOTE, vect_location, >> - "marking DR (read) as needing peeling for " >> - "alignment at %G", stmt); >> + "marking DR (read) as possibly needing peeling >> " >> + "for alignment at %G", stmt); >> } >> >> if (DR_IS_READ (dr_ref)) >> @@ -1326,9 +1318,6 @@ vect_record_base_alignments (vec_info *vinfo) >> Compute the misalignment of the data reference DR_INFO when vectorizing >> with VECTYPE. >> >> - RESULT is non-NULL iff VINFO is a loop_vec_info. In that case, *RESULT >> will >> - be set appropriately on failure (but is otherwise left unchanged). >> - >> Output: >> 1. initialized misalignment info for DR_INFO >> >> @@ -1337,7 +1326,7 @@ vect_record_base_alignments (vec_info *vinfo) >> >> static void >> vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info, >> - tree vectype, opt_result *result = nullptr) >> + tree vectype) >> { >> stmt_vec_info stmt_info = dr_info->stmt; >> vec_base_alignments *base_alignments = &vinfo->base_alignments; >> @@ -1365,63 +1354,29 @@ vect_compute_data_ref_alignment (vec_info *vinfo, >> dr_vec_info *dr_info, >> = exact_div (targetm.vectorize.preferred_vector_alignment (vectype), >> BITS_PER_UNIT); >> >> - /* If this DR needs peeling for alignment for correctness, we must >> - ensure the target alignment is a constant power-of-two multiple of the >> - amount read per vector iteration (overriding the above hook where >> - necessary). */ >> - if (dr_info->need_peeling_for_alignment) >> + if (loop_vinfo >> + && dr_safe_speculative_read_required (stmt_info)) >> { >> - /* Vector size in bytes. */ >> - poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT >> (vectype)); >> - >> - /* We can only peel for loops, of course. */ >> - gcc_checking_assert (loop_vinfo); >> - >> - /* Calculate the number of vectors read per vector iteration. If >> - it is a power of two, multiply through to get the required >> - alignment in bytes. Otherwise, fail analysis since alignment >> - peeling wouldn't work in such a case. */ >> - poly_uint64 num_scalars = LOOP_VINFO_VECT_FACTOR (loop_vinfo); >> + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); >> + auto vectype_size >> + = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype))); >> + poly_uint64 new_alignment = vf * vectype_size; >> + /* If we have a grouped access we require that the alignment be N * >> elem. */ >> if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) >> - num_scalars *= DR_GROUP_SIZE (stmt_info); >> + new_alignment *= DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (stmt_info)); >> >> - auto num_vectors = vect_get_num_vectors (num_scalars, vectype); >> - if (!pow2p_hwi (num_vectors)) >> - { >> - *result = opt_result::failure_at (vect_location, >> - "non-power-of-two num vectors %u " >> - "for DR needing peeling for " >> - "alignment at %G", >> - num_vectors, stmt_info->stmt); >> - return; >> - } >> - >> - safe_align *= num_vectors; >> - if (maybe_gt (safe_align, 4096U)) >> - { >> - pretty_printer pp; >> - pp_wide_integer (&pp, safe_align); >> - *result = opt_result::failure_at (vect_location, >> - "alignment required for correctness" >> - " (%s) may exceed page size", >> - pp_formatted_text (&pp)); >> - return; >> - } >> - >> - unsigned HOST_WIDE_INT multiple; >> - if (!constant_multiple_p (vector_alignment, safe_align, &multiple) >> - || !pow2p_hwi (multiple)) >> + unsigned HOST_WIDE_INT target_alignment; >> + if (new_alignment.is_constant (&target_alignment) >> + && pow2p_hwi (target_alignment)) >> { >> if (dump_enabled_p ()) >> { >> dump_printf_loc (MSG_NOTE, vect_location, >> - "forcing alignment for DR from preferred ("); >> - dump_dec (MSG_NOTE, vector_alignment); >> - dump_printf (MSG_NOTE, ") to safe align ("); >> - dump_dec (MSG_NOTE, safe_align); >> - dump_printf (MSG_NOTE, ") for stmt: %G", stmt_info->stmt); >> + "alignment increased due to early break to "); >> + dump_dec (MSG_NOTE, new_alignment); >> + dump_printf (MSG_NOTE, " bytes.\n"); >> } >> - vector_alignment = safe_align; >> + vector_alignment = target_alignment; >> } >> } >> >> @@ -2487,6 +2442,8 @@ vect_enhance_data_refs_alignment (loop_vec_info >> loop_vinfo) >> || !slpeel_can_duplicate_loop_p (loop, LOOP_VINFO_IV_EXIT >> (loop_vinfo), >> loop_preheader_edge (loop)) >> || loop->inner >> + /* We don't currently maintaing the LCSSA for prologue peeled inversed >> + loops. */ >> || LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo)) >> do_peeling = false; >> >> @@ -2950,12 +2907,9 @@ vect_analyze_data_refs_alignment (loop_vec_info >> loop_vinfo) >> if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt) >> && DR_GROUP_FIRST_ELEMENT (dr_info->stmt) != dr_info->stmt) >> continue; >> - opt_result res = opt_result::success (); >> + >> vect_compute_data_ref_alignment (loop_vinfo, dr_info, >> - STMT_VINFO_VECTYPE (dr_info->stmt), >> - &res); >> - if (!res) >> - return res; >> + STMT_VINFO_VECTYPE (dr_info->stmt)); >> } >> } >> >> @@ -7219,7 +7173,7 @@ vect_supportable_dr_alignment (vec_info *vinfo, >> dr_vec_info *dr_info, >> >> if (misalignment == 0) >> return dr_aligned; >> - else if (dr_info->need_peeling_for_alignment) >> + else if (dr_safe_speculative_read_required (stmt_info)) >> return dr_unaligned_unsupported; >> >> /* For now assume all conditional loads/stores support unaligned >> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc >> index >> 6bbb16beff2c627fca11a7403ba5ee3a5faa21c1..743631f944884a31505a95f7a188fd4e4ca3797d >> 100644 >> --- a/gcc/tree-vect-stmts.cc >> +++ b/gcc/tree-vect-stmts.cc >> @@ -2597,6 +2597,128 @@ get_load_store_type (vec_info *vinfo, stmt_vec_info >> stmt_info, >> return false; >> } >> >> + >> + /* Checks if all scalar iterations are known to be inbounds. */ >> + bool inbounds = DR_SCALAR_KNOWN_BOUNDS (STMT_VINFO_DR_INFO (stmt_info)); >> + >> + /* Check if we support the operation if early breaks are needed. Here we >> + must ensure that we don't access any more than the scalar code would >> + have. A masked operation would ensure this, so for these load types >> + force masking. */ >> + if (loop_vinfo >> + && dr_safe_speculative_read_required (stmt_info) >> + && LOOP_VINFO_EARLY_BREAKS (loop_vinfo) >> + && (*memory_access_type == VMAT_GATHER_SCATTER >> + || *memory_access_type == VMAT_STRIDED_SLP)) >> + { >> + if (dump_enabled_p ()) >> + dump_printf_loc (MSG_NOTE, vect_location, >> + "early break not supported: cannot peel for " >> + "alignment. With non-contiguous memory vectorization" >> + " could read out of bounds at %G ", >> + STMT_VINFO_STMT (stmt_info)); >> + if (inbounds) >> + LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = true; >> + else >> + return false; >> + } >> + >> + /* If this DR needs alignment for correctness, we must ensure the target >> + alignment is a constant power-of-two multiple of the amount read per >> + vector iteration or force masking. */ >> + if (dr_safe_speculative_read_required (stmt_info) >> + && *alignment_support_scheme == dr_aligned) >> + { >> + /* We can only peel for loops, of course. */ >> + gcc_checking_assert (loop_vinfo); >> + >> + auto target_alignment >> + = DR_TARGET_ALIGNMENT (STMT_VINFO_DR_INFO (stmt_info)); >> + unsigned HOST_WIDE_INT target_align; >> + >> + bool group_aligned = false; >> + if (target_alignment.is_constant (&target_align) >> + && nunits.is_constant ()) >> + { >> + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); >> + auto vectype_size >> + = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype))); >> + poly_uint64 required_alignment = vf * vectype_size; >> + /* If we have a grouped access we require that the alignment be N * >> elem. */ >> + if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) >> + required_alignment *= >> + DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (stmt_info)); >> + if (!multiple_p (target_alignment, required_alignment)) >> + { >> + if (dump_enabled_p ()) >> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >> + "desired alignment %wu not met. Instead got %wu " >> + "for DR alignment at %G", >> + required_alignment.to_constant (), >> + target_align, STMT_VINFO_STMT (stmt_info)); >> + return false; >> + } >> + >> + if (!pow2p_hwi (target_align)) >> + { >> + if (dump_enabled_p ()) >> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >> + "non-power-of-two vector alignment %wd " >> + "for DR alignment at %G", >> + target_align, STMT_VINFO_STMT (stmt_info)); >> + return false; >> + } >> + >> + /* For VLA we have to insert a runtime check that the vector loads >> + per iterations don't exceed a page size. For now we can use >> + POLY_VALUE_MAX as a proxy as we can't peel for VLA. */ >> + if (known_gt (required_alignment, (unsigned)param_min_pagesize)) >> + { >> + if (dump_enabled_p ()) >> + { >> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >> + "alignment required for correctness ("); >> + dump_dec (MSG_MISSED_OPTIMIZATION, required_alignment); >> + dump_printf (MSG_NOTE, ") may exceed page size\n"); >> + } >> + return false; >> + } >> + >> + group_aligned = true; >> + } >> + >> + /* There are multiple loads that have a misalignment that we couldn't >> + align. We would need LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P to >> + vectorize. */ >> + if (!group_aligned) >> + { >> + if (inbounds) >> + LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = true; >> + else >> + return false; >> + } >> + >> + /* When using a group access the first element may be aligned but the >> + subsequent loads may not be. For LOAD_LANES since the loads are based >> + on the first DR then all loads in the group are aligned. For >> + non-LOAD_LANES this is not the case. In particular a load + blend when >> + there are gaps can have the non first loads issued unaligned, even >> + partially overlapping the memory of the first load in order to simplify >> + the blend. This is what the x86_64 backend does for instance. As >> + such only the first load in the group is aligned, the rest are not. >> + Because of this the permutes may break the alignment requirements that >> + have been set, and as such we should for now, reject them. */ >> + if (SLP_TREE_LOAD_PERMUTATION (slp_node).exists ()) >> + { >> + if (dump_enabled_p ()) >> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >> + "loads with load permutations not supported for " >> + "speculative early break loads for %G", >> + STMT_VINFO_STMT (stmt_info)); >> + return false; >> + } >> + } >> + >> if (*alignment_support_scheme == dr_unaligned_unsupported) >> { >> if (dump_enabled_p ()) >> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h >> index >> b0cb081cba0ae8b11fbfcfcb8c6d440ec451ccb5..97caf61b345735d297ec49fd6ca64797435b46fc >> 100644 >> --- a/gcc/tree-vectorizer.h >> +++ b/gcc/tree-vectorizer.h >> @@ -1281,7 +1281,11 @@ public: >> >> /* Set by early break vectorization when this DR needs peeling for >> alignment >> for correctness. */ >> - bool need_peeling_for_alignment; >> + bool safe_speculative_read_required; >> + >> + /* Set by early break vectorization when this DR's scalar accesses are >> known >> + to be inbounds of a known bounds loop. */ >> + bool scalar_access_known_in_bounds; >> >> tree base_decl; >> >> @@ -1997,6 +2001,35 @@ dr_target_alignment (dr_vec_info *dr_info) >> return dr_info->target_alignment; >> } >> #define DR_TARGET_ALIGNMENT(DR) dr_target_alignment (DR) >> +#define DR_SCALAR_KNOWN_BOUNDS(DR) (DR)->scalar_access_known_in_bounds >> + >> +/* Return if the stmt_vec_info requires peeling for alignment. */ >> +inline bool >> +dr_safe_speculative_read_required (stmt_vec_info stmt_info) >> +{ >> + dr_vec_info *dr_info; >> + if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) >> + dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (stmt_info)); >> + else >> + dr_info = STMT_VINFO_DR_INFO (stmt_info); >> + >> + return dr_info->safe_speculative_read_required; >> +} >> + >> +/* Set the safe_speculative_read_required for the the stmt_vec_info, if >> group >> + access then set on the fist element otherwise set on DR directly. */ >> +inline void >> +dr_set_safe_speculative_read_required (stmt_vec_info stmt_info, >> + bool requires_alignment) >> +{ >> + dr_vec_info *dr_info; >> + if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) >> + dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (stmt_info)); >> + else >> + dr_info = STMT_VINFO_DR_INFO (stmt_info); >> + >> + dr_info->safe_speculative_read_required = requires_alignment; >> +} >> >> inline void >> set_dr_target_alignment (dr_vec_info *dr_info, poly_uint64 val) >> >> >>