On Mon, Mar 31, 2025 at 9:41 AM Richard Biener <richard.guent...@gmail.com> wrote: > > On Mon, Mar 31, 2025 at 9:36 AM Kyrylo Tkachov <ktkac...@nvidia.com> wrote: > > > > Ping. > > Can you reference the patch please? I'll note your mails have the tendency to > end up in my spam folder (which is auto-purged after some time). Probably > a setup issue at nvidias side.
Found it. Your mails fail both DKIM and DMARC so gmail thinks you are phishing me. Richard. > > Richard. > > > Thanks, > > Kyrill > > > > > On 24 Mar 2025, at 14:28, Kyrylo Tkachov <ktkac...@nvidia.com> wrote: > > > > > > Hi all, > > > > > > In this testcase GCC tries to expand a VNx4BI vector: > > > vector(4) <signed-boolean:4> _40; > > > _39 = (<signed-boolean:4>) _24; > > > _40 = {_39, _39, _39, _39}; > > > > > > This ends up in a scalarised sequence of bitfield insert operations. > > > This is despite the fact that AArch64 provides a vec_duplicate pattern > > > specifically for vec_duplicate into VNx4BI. > > > > > > The store_constructor code is overly conservative when trying > > > vec_duplicate > > > as it sees a requested VNx4BImode and an element mode of QImode, which I > > > guess > > > is the storage mode of BImode objects. > > > > > > The vec_duplicate expander in aarch64-sve.md explicitly allows QImode > > > element > > > modes so it should be safe to use it. This patch extends that mode check > > > to allow such expanders. > > > > > > The testcase is heavily auto-reduced from a real application but in > > > itself is > > > nonsensical, but it does demonstrate the current problematic codegen. > > > > > > This the testcase goes from: > > > pfalse p15.b > > > str p15, [sp, #6, mul vl] > > > mov w0, 0 > > > ldr w2, [sp, 12] > > > bfi w2, w0, 0, 4 > > > uxtw x2, w2 > > > bfi w2, w0, 4, 4 > > > uxtw x2, w2 > > > bfi w2, w0, 8, 4 > > > uxtw x2, w2 > > > bfi w2, w0, 12, 4 > > > str w2, [sp, 12] > > > ldr p15, [sp, #6, mul vl] > > > > > > into: > > > whilelo p15.s, wzr, wzr > > > > > > The whilelo could be optimised away into a pfalse of course, but the > > > important > > > part is that the bfis are gone. > > > > > > Bootstrapped and tested on aarch64-none-linux-gnu. > > > > > > Given this a regression from GCC 13 is this ok for trunk now? > > > Thanks, > > > Kyrill > > > > > > Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com> > > > > > > gcc/ > > > > > > PR middle-end/119442 > > > * expr.cc (store_constructor): Also allow element modes explicitly > > > accepted by target vec_duplicate pattern. > > > > > > gcc/testsuite/ > > > > > > PR middle-end/119442 > > > * gcc.target/aarch64/vls_sve_vec_dup_1.c: New test. > > > > > > <0001-PR-middle-end-119442-expr.cc-Fix-vec_duplicate-into-.patch> > >