On Fri, 17 May 2024, Richard Sandiford wrote: > Richard Biener via Gcc <gcc@gcc.gnu.org> writes: > > Hi, > > > > I'd like to discuss how to go forward with getting the vectorizer to > > all-SLP for this stage1. While there is a personal branch with my > > ongoing work (users/rguenth/vect-force-slp) branches haven't proved > > themselves working well for collaboration. > > Speaking for myself, the problem hasn't been so much the branch as > lack of time. I've been pretty swamped the last eight months of so > (except for the time that I took off, which admittedly was quite a > bit!), and so I never even got around to properly reading and replying > to your message after the Cauldron. It's been on the "this is important, > I should make time to read and understand it properly" list all this time. > Sorry about that. :( > > I'm hoping to have time to work/help out on SLP stuff soon. > > > The branch isn't ready to be merged in full but I have been picking > > improvements to trunk last stage1 and some remaining bits in the past > > weeks. I have refrained from merging code paths that cannot be > > exercised on trunk. > > > > There are two important set of changes on the branch, both critical > > to get more testing on non-x86 targets. > > > > 1. enable single-lane SLP discovery > > 2. avoid splitting store groups (9315bfc661432c3 and 4336060fe2db8ec > > if you fetch the branch) > > > > The first point is also most annoying on the testsuite since doing > > SLP instead of interleaving changes what we dump and thus tests > > start to fail in random ways when you switch between both modes. > > On the branch single-lane SLP discovery is gated with > > --param vect-single-lane-slp. > > > > The branch has numerous changes to enable single-lane SLP for some > > code paths that have SLP not implemented and where I did not bother > > to try supporting multi-lane SLP at this point. It also adds more > > SLP discovery entry points. > > > > I'm not sure how to try merging these pieces to allow others to > > more easily help out. One possibility is to merge > > --param vect-single-lane-slp defaulted off and pick dependent > > changes even when they cause testsuite regressions with > > vect-single-lane-slp=1. Alternatively adjust the testsuite by > > adding --param vect-single-lane-slp=0 and default to 1 > > (or keep the default). > > FWIW, this one sounds good to me (the default to 1 version). > I.e. mechanically add --param vect-single-lane-slp=0 to any tests > that fail with the new default. That means that the test that need > fixing are easily greppable for anyone who wants to help. Sometimes > it'll just be a test update. Sometimes it will be new vectoriser code.
OK. Meanwhile I figured the most important part is 2. from above since that enables the single-lane in a grouped access (also covering single element interleaving). This will cover all problematical cases with respect to vectorizing loads and stores. It also has less testsuite fallout, mainly because we have a lot less coverage for grouped stores without SLP. So I'll see to produce a mergeable patch for part 2 and post that for review next week. Thanks, Richard. > Thanks, > Richard > > > Or require a clean testsuite with > > --param vect-single-lane-slp defaulted to 1 but keep the --param > > for debugging (and allow FAILs with 0). > > > > For fun I merged just single-lane discovery of non-grouped stores > > and have that enabled by default. On x86_64 this results in the > > set of FAILs below. > > > > Any suggestions? > > > > Thanks, > > Richard. > > > > FAIL: gcc.dg/vect/O3-pr39675-2.c scan-tree-dump-times vect "vectorizing > > stmts using SLP" 1 > > XPASS: gcc.dg/vect/no-scevccp-outer-12.c scan-tree-dump-times vect "OUTER > > LOOP VECTORIZED." 1 > > FAIL: gcc.dg/vect/no-section-anchors-vect-31.c scan-tree-dump-times vect > > "Alignment of access forced using peeling" 2 > > FAIL: gcc.dg/vect/no-section-anchors-vect-31.c scan-tree-dump-times vect > > "Vectorizing an unaligned access" 0 > > FAIL: gcc.dg/vect/no-section-anchors-vect-64.c scan-tree-dump-times vect > > "Alignment of access forced using peeling" 2 > > FAIL: gcc.dg/vect/no-section-anchors-vect-64.c scan-tree-dump-times vect > > "Vectorizing an unaligned access" 0 > > FAIL: gcc.dg/vect/no-section-anchors-vect-66.c scan-tree-dump-times vect > > "Alignment of access forced using peeling" 1 > > FAIL: gcc.dg/vect/no-section-anchors-vect-66.c scan-tree-dump-times vect > > "Vectorizing an unaligned access" 0 > > FAIL: gcc.dg/vect/no-section-anchors-vect-68.c scan-tree-dump-times vect > > "Alignment of access forced using peeling" 2 > > FAIL: gcc.dg/vect/no-section-anchors-vect-68.c scan-tree-dump-times vect > > "Vectorizing an unaligned access" 0 > > FAIL: gcc.dg/vect/slp-12a.c -flto -ffat-lto-objects scan-tree-dump-times > > vect "vectorizing stmts using SLP" 1 > > FAIL: gcc.dg/vect/slp-12a.c scan-tree-dump-times vect "vectorizing stmts > > using SLP" 1 > > FAIL: gcc.dg/vect/slp-19a.c -flto -ffat-lto-objects scan-tree-dump-times > > vect "vectorizing stmts using SLP" 1 > > FAIL: gcc.dg/vect/slp-19a.c scan-tree-dump-times vect "vectorizing stmts > > using SLP" 1 > > FAIL: gcc.dg/vect/slp-19b.c -flto -ffat-lto-objects scan-tree-dump-times > > vect "vectorizing stmts using SLP" 1 > > FAIL: gcc.dg/vect/slp-19b.c scan-tree-dump-times vect "vectorizing stmts > > using SLP" 1 > > FAIL: gcc.dg/vect/slp-19c.c -flto -ffat-lto-objects scan-tree-dump-times > > vect "vectorized 1 loops" 1 > > FAIL: gcc.dg/vect/slp-19c.c -flto -ffat-lto-objects scan-tree-dump-times > > vect "vectorizing stmts using SLP" 1 > > FAIL: gcc.dg/vect/slp-19c.c scan-tree-dump-times vect "vectorized 1 loops" > > 1 > > FAIL: gcc.dg/vect/slp-19c.c scan-tree-dump-times vect "vectorizing stmts > > using SLP" 1 > > XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c -flto -ffat-lto-objects > > scan-tree-dump vect "vectorized 1 loops" > > XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c scan-tree-dump vect "vectorized > > 1 loops" > > XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c -flto -ffat-lto-objects > > scan-tree-dump vect "vectorized 1 loops" > > XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c scan-tree-dump vect "vectorized 1 > > loops" > > XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c -flto -ffat-lto-objects > > scan-tree-dump vect "vectorized 1 loops" > > XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c scan-tree-dump vect "vectorized > > 1 loops" > > XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c -flto -ffat-lto-objects > > scan-tree-dump vect "vectorized 1 loops" > > XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c scan-tree-dump vect "vectorized 1 > > loops" > > FAIL: gcc.dg/vect/vect-26.c -flto -ffat-lto-objects scan-tree-dump-times > > vect "Alignment of access forced using peeling" 1 > > FAIL: gcc.dg/vect/vect-26.c -flto -ffat-lto-objects scan-tree-dump-times > > vect "Vectorizing an unaligned access" 0 > > FAIL: gcc.dg/vect/vect-26.c scan-tree-dump-times vect "Alignment of access > > forced using peeling" 1 > > FAIL: gcc.dg/vect/vect-26.c scan-tree-dump-times vect "Vectorizing an > > unaligned access" 0 > > FAIL: gcc.dg/vect/vect-54.c -flto -ffat-lto-objects scan-tree-dump-times > > vect "Alignment of access forced using peeling" 1 > > FAIL: gcc.dg/vect/vect-54.c -flto -ffat-lto-objects scan-tree-dump-times > > vect "Vectorizing an unaligned access" 0 > > FAIL: gcc.dg/vect/vect-54.c scan-tree-dump-times vect "Alignment of access > > forced using peeling" 1 > > FAIL: gcc.dg/vect/vect-54.c scan-tree-dump-times vect "Vectorizing an > > unaligned access" 0 > > FAIL: gcc.dg/vect/vect-56.c -flto -ffat-lto-objects scan-tree-dump-times > > vect "Alignment of access forced using peeling" 1 > > FAIL: gcc.dg/vect/vect-56.c -flto -ffat-lto-objects scan-tree-dump-times > > vect "Vectorizing an unaligned access" 1 > > FAIL: gcc.dg/vect/vect-56.c scan-tree-dump-times vect "Alignment of access > > forced using peeling" 1 > > FAIL: gcc.dg/vect/vect-56.c scan-tree-dump-times vect "Vectorizing an > > unaligned access" 1 > > FAIL: gcc.dg/vect/vect-58.c -flto -ffat-lto-objects scan-tree-dump-times > > vect "Alignment of access forced using peeling" 1 > > FAIL: gcc.dg/vect/vect-58.c -flto -ffat-lto-objects scan-tree-dump-times > > vect "Vectorizing an unaligned access" 0 > > FAIL: gcc.dg/vect/vect-58.c scan-tree-dump-times vect "Alignment of access > > forced using peeling" 1 > > FAIL: gcc.dg/vect/vect-58.c scan-tree-dump-times vect "Vectorizing an > > unaligned access" 0 > > FAIL: gcc.dg/vect/vect-60.c -flto -ffat-lto-objects scan-tree-dump-times > > vect "Alignment of access forced using peeling" 1 > > FAIL: gcc.dg/vect/vect-60.c -flto -ffat-lto-objects scan-tree-dump-times > > vect "Vectorizing an unaligned access" 1 > > FAIL: gcc.dg/vect/vect-60.c scan-tree-dump-times vect "Alignment of access > > forced using peeling" 1 > > FAIL: gcc.dg/vect/vect-60.c scan-tree-dump-times vect "Vectorizing an > > unaligned access" 1 > > FAIL: gcc.dg/vect/vect-89-big-array.c -flto -ffat-lto-objects > > scan-tree-dump-times vect "Alignment of access forced using peeling" 1 > > FAIL: gcc.dg/vect/vect-89-big-array.c -flto -ffat-lto-objects > > scan-tree-dump-times vect "Vectorizing an unaligned access" 0 > > FAIL: gcc.dg/vect/vect-89-big-array.c scan-tree-dump-times vect "Alignment > > of access forced using peeling" 1 > > FAIL: gcc.dg/vect/vect-89-big-array.c scan-tree-dump-times vect > > "Vectorizing an unaligned access" 0 > > FAIL: gcc.dg/vect/vect-89.c -flto -ffat-lto-objects scan-tree-dump-times > > vect "Alignment of access forced using peeling" 1 > > FAIL: gcc.dg/vect/vect-89.c -flto -ffat-lto-objects scan-tree-dump-times > > vect "Vectorizing an unaligned access" 0 > > FAIL: gcc.dg/vect/vect-89.c scan-tree-dump-times vect "Alignment of access > > forced using peeling" 1 > > FAIL: gcc.dg/vect/vect-89.c scan-tree-dump-times vect "Vectorizing an > > unaligned access" 0 > > FAIL: gcc.dg/vect/vect-92.c -flto -ffat-lto-objects scan-tree-dump-times > > vect "Alignment of access forced using peeling" 3 > > FAIL: gcc.dg/vect/vect-92.c -flto -ffat-lto-objects scan-tree-dump-times > > vect "Vectorizing an unaligned access" 0 > > FAIL: gcc.dg/vect/vect-92.c scan-tree-dump-times vect "Alignment of access > > forced using peeling" 3 > > FAIL: gcc.dg/vect/vect-92.c scan-tree-dump-times vect "Vectorizing an > > unaligned access" 0 > > FAIL: gcc.dg/vect/vect-early-break_25.c -flto -ffat-lto-objects > > scan-tree-dump-times vect "Alignment of access forced using peeling" 1 > > FAIL: gcc.dg/vect/vect-early-break_25.c scan-tree-dump-times vect > > "Alignment of access forced using peeling" 1 > > FAIL: gcc.dg/vect/vect-multitypes-1.c -flto -ffat-lto-objects > > scan-tree-dump-times vect "Alignment of access forced using peeling" 2 > > FAIL: gcc.dg/vect/vect-multitypes-1.c scan-tree-dump-times vect "Alignment > > of access forced using peeling" 2 > > XPASS: gcc.dg/vect/vect-outer-4e.c -flto -ffat-lto-objects > > scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1 > > XPASS: gcc.dg/vect/vect-outer-4e.c scan-tree-dump-times vect "OUTER LOOP > > VECTORIZED" 1 > > FAIL: gcc.dg/vect/vect-peel-1.c -flto -ffat-lto-objects > > scan-tree-dump-times vect "Alignment of access forced using peeling" 1 > > FAIL: gcc.dg/vect/vect-peel-1.c -flto -ffat-lto-objects > > scan-tree-dump-times vect "Vectorizing an unaligned access" 1 > > FAIL: gcc.dg/vect/vect-peel-1.c scan-tree-dump-times vect "Alignment of > > access forced using peeling" 1 > > FAIL: gcc.dg/vect/vect-peel-1.c scan-tree-dump-times vect "Vectorizing an > > unaligned access" 1 > > FAIL: gcc.dg/vect/vect-peel-2.c -flto -ffat-lto-objects > > scan-tree-dump-times vect "Alignment of access forced using peeling" 1 > > FAIL: gcc.dg/vect/vect-peel-2.c -flto -ffat-lto-objects > > scan-tree-dump-times vect "Vectorizing an unaligned access" 1 > > FAIL: gcc.dg/vect/vect-peel-2.c scan-tree-dump-times vect "Alignment of > > access forced using peeling" 1 > > FAIL: gcc.dg/vect/vect-peel-2.c scan-tree-dump-times vect "Vectorizing an > > unaligned access" 1 > > FAIL: gcc.target/i386/cond_op_fma__Float16-1.c scan-assembler-times > > vfmadd132ph[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1 > > FAIL: gcc.target/i386/cond_op_fma__Float16-1.c scan-assembler-times > > vfmsub132ph[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1 > > FAIL: gcc.target/i386/cond_op_fma__Float16-1.c scan-assembler-times > > vfnmadd132ph[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1 > > FAIL: gcc.target/i386/cond_op_fma__Float16-1.c scan-assembler-times > > vfnmsub132ph[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1 > > FAIL: gcc.target/i386/cond_op_fma_double-1.c scan-assembler-times > > vfmadd132pd[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1 > > FAIL: gcc.target/i386/cond_op_fma_double-1.c scan-assembler-times > > vfmsub132pd[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1 > > FAIL: gcc.target/i386/cond_op_fma_double-1.c scan-assembler-times > > vfnmadd132pd[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1 > > FAIL: gcc.target/i386/cond_op_fma_double-1.c scan-assembler-times > > vfnmsub132pd[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1 > > FAIL: gcc.target/i386/cond_op_fma_float-1.c scan-assembler-times > > vfmadd132ps[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1 > > FAIL: gcc.target/i386/cond_op_fma_float-1.c scan-assembler-times > > vfmsub132ps[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1 > > FAIL: gcc.target/i386/cond_op_fma_float-1.c scan-assembler-times > > vfnmadd132ps[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1 > > FAIL: gcc.target/i386/cond_op_fma_float-1.c scan-assembler-times > > vfnmsub132ps[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1 > > FAIL: gcc.target/i386/pr101950-2.c scan-assembler-times \\txor[ql]\\t 2 > > FAIL: gcc.target/i386/pr88531-2b.c scan-assembler-times vmulps 1 > > FAIL: gcc.target/i386/pr88531-2c.c scan-assembler-times vmulps 1 > > FAIL: gcc.target/i386/vectorize1.c scan-tree-dump vect "vect_cst" > > FAIL: gfortran.dg/temporary_3.f90 -O2 execution test > > FAIL: gfortran.dg/vect/fast-math-mgrid-resid.f -O scan-tree-dump pcom > > "Executing predictive commoning without unrolling" > > FAIL: gfortran.dg/vect/vect-8.f90 -O scan-tree-dump-times vect > > "vectorized 2[234] loops" 1 > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)