> On 31 Mar 2025, at 09:43, Richard Biener <richard.guent...@gmail.com> wrote: > > On Mon, Mar 31, 2025 at 9:41 AM Richard Biener > <richard.guent...@gmail.com> wrote: >> >> On Mon, Mar 31, 2025 at 9:36 AM Kyrylo Tkachov <ktkac...@nvidia.com> wrote: >>> >>> Ping. >> >> Can you reference the patch please? I'll note your mails have the tendency >> to >> end up in my spam folder (which is auto-purged after some time). Probably >> a setup issue at nvidias side. > > Found it. Your mails fail both DKIM and DMARC so gmail thinks you are > phishing me.
Thanks for the review. Sorry about that, I think Mark had raised a BZ issue somewhere tracking this, Mark do you recall something like that? I’m afraid I don’t know much about email workings to address this, but if there’s more of a writeup on the issue I can forward it to someone internally who can help… Kyrill > > Richard. > >> >> Richard. >> >>> Thanks, >>> Kyrill >>> >>>> On 24 Mar 2025, at 14:28, Kyrylo Tkachov <ktkac...@nvidia.com> wrote: >>>> >>>> Hi all, >>>> >>>> In this testcase GCC tries to expand a VNx4BI vector: >>>> vector(4) <signed-boolean:4> _40; >>>> _39 = (<signed-boolean:4>) _24; >>>> _40 = {_39, _39, _39, _39}; >>>> >>>> This ends up in a scalarised sequence of bitfield insert operations. >>>> This is despite the fact that AArch64 provides a vec_duplicate pattern >>>> specifically for vec_duplicate into VNx4BI. >>>> >>>> The store_constructor code is overly conservative when trying vec_duplicate >>>> as it sees a requested VNx4BImode and an element mode of QImode, which I >>>> guess >>>> is the storage mode of BImode objects. >>>> >>>> The vec_duplicate expander in aarch64-sve.md explicitly allows QImode >>>> element >>>> modes so it should be safe to use it. This patch extends that mode check >>>> to allow such expanders. >>>> >>>> The testcase is heavily auto-reduced from a real application but in itself >>>> is >>>> nonsensical, but it does demonstrate the current problematic codegen. >>>> >>>> This the testcase goes from: >>>> pfalse p15.b >>>> str p15, [sp, #6, mul vl] >>>> mov w0, 0 >>>> ldr w2, [sp, 12] >>>> bfi w2, w0, 0, 4 >>>> uxtw x2, w2 >>>> bfi w2, w0, 4, 4 >>>> uxtw x2, w2 >>>> bfi w2, w0, 8, 4 >>>> uxtw x2, w2 >>>> bfi w2, w0, 12, 4 >>>> str w2, [sp, 12] >>>> ldr p15, [sp, #6, mul vl] >>>> >>>> into: >>>> whilelo p15.s, wzr, wzr >>>> >>>> The whilelo could be optimised away into a pfalse of course, but the >>>> important >>>> part is that the bfis are gone. >>>> >>>> Bootstrapped and tested on aarch64-none-linux-gnu. >>>> >>>> Given this a regression from GCC 13 is this ok for trunk now? >>>> Thanks, >>>> Kyrill >>>> >>>> Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com> >>>> >>>> gcc/ >>>> >>>> PR middle-end/119442 >>>> * expr.cc (store_constructor): Also allow element modes explicitly >>>> accepted by target vec_duplicate pattern. >>>> >>>> gcc/testsuite/ >>>> >>>> PR middle-end/119442 >>>> * gcc.target/aarch64/vls_sve_vec_dup_1.c: New test. >>>> >>>> <0001-PR-middle-end-119442-expr.cc-Fix-vec_duplicate-into-.patch> >>>