Ping.

Thanks,
Kyrill

> On 24 Mar 2025, at 14:28, Kyrylo Tkachov <ktkac...@nvidia.com> wrote:
> 
> Hi all,
> 
> In this testcase GCC tries to expand a VNx4BI vector:
> vector(4) <signed-boolean:4> _40;
> _39 = (<signed-boolean:4>) _24;
> _40 = {_39, _39, _39, _39};
> 
> This ends up in a scalarised sequence of bitfield insert operations.
> This is despite the fact that AArch64 provides a vec_duplicate pattern
> specifically for vec_duplicate into VNx4BI.
> 
> The store_constructor code is overly conservative when trying vec_duplicate
> as it sees a requested VNx4BImode and an element mode of QImode, which I guess
> is the storage mode of BImode objects.
> 
> The vec_duplicate expander in aarch64-sve.md explicitly allows QImode element
> modes so it should be safe to use it. This patch extends that mode check
> to allow such expanders.
> 
> The testcase is heavily auto-reduced from a real application but in itself is
> nonsensical, but it does demonstrate the current problematic codegen.
> 
> This the testcase goes from:
> pfalse p15.b
> str p15, [sp, #6, mul vl]
> mov w0, 0
> ldr w2, [sp, 12]
> bfi w2, w0, 0, 4
> uxtw x2, w2
> bfi w2, w0, 4, 4
> uxtw x2, w2
> bfi w2, w0, 8, 4
> uxtw x2, w2
> bfi w2, w0, 12, 4
> str w2, [sp, 12]
> ldr p15, [sp, #6, mul vl]
> 
> into:
> whilelo p15.s, wzr, wzr
> 
> The whilelo could be optimised away into a pfalse of course, but the important
> part is that the bfis are gone.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> 
> Given this a regression from GCC 13 is this ok for trunk now?
> Thanks,
> Kyrill
> 
> Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com>
> 
> gcc/
> 
> PR middle-end/119442
> * expr.cc (store_constructor): Also allow element modes explicitly
> accepted by target vec_duplicate pattern.
> 
> gcc/testsuite/
> 
> PR middle-end/119442
> * gcc.target/aarch64/vls_sve_vec_dup_1.c: New test.
> 
> <0001-PR-middle-end-119442-expr.cc-Fix-vec_duplicate-into-.patch>

Reply via email to