Hi all, In this testcase GCC tries to expand a VNx4BI vector: vector(4) <signed-boolean:4> _40; _39 = (<signed-boolean:4>) _24; _40 = {_39, _39, _39, _39};
This ends up in a scalarised sequence of bitfield insert operations. This is despite the fact that AArch64 provides a vec_duplicate pattern specifically for vec_duplicate into VNx4BI. The store_constructor code is overly conservative when trying vec_duplicate as it sees a requested VNx4BImode and an element mode of QImode, which I guess is the storage mode of BImode objects. The vec_duplicate expander in aarch64-sve.md explicitly allows QImode element modes so it should be safe to use it. This patch extends that mode check to allow such expanders. The testcase is heavily auto-reduced from a real application but in itself is nonsensical, but it does demonstrate the current problematic codegen. This the testcase goes from: pfalse p15.b str p15, [sp, #6, mul vl] mov w0, 0 ldr w2, [sp, 12] bfi w2, w0, 0, 4 uxtw x2, w2 bfi w2, w0, 4, 4 uxtw x2, w2 bfi w2, w0, 8, 4 uxtw x2, w2 bfi w2, w0, 12, 4 str w2, [sp, 12] ldr p15, [sp, #6, mul vl] into: whilelo p15.s, wzr, wzr The whilelo could be optimised away into a pfalse of course, but the important part is that the bfis are gone. Bootstrapped and tested on aarch64-none-linux-gnu. Given this a regression from GCC 13 is this ok for trunk now? Thanks, Kyrill Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com> gcc/ PR middle-end/119442 * expr.cc (store_constructor): Also allow element modes explicitly accepted by target vec_duplicate pattern. gcc/testsuite/ PR middle-end/119442 * gcc.target/aarch64/vls_sve_vec_dup_1.c: New test.
0001-PR-middle-end-119442-expr.cc-Fix-vec_duplicate-into-.patch
Description: 0001-PR-middle-end-119442-expr.cc-Fix-vec_duplicate-into-.patch