> On 31 Mar 2025, at 09:43, Richard Biener <richard.guent...@gmail.com> wrote:
> 
> On Mon, Mar 31, 2025 at 9:41 AM Richard Biener
> <richard.guent...@gmail.com> wrote:
>> 
>> On Mon, Mar 31, 2025 at 9:36 AM Kyrylo Tkachov <ktkac...@nvidia.com> wrote:
>>> 
>>> Ping.
>> 
>> Can you reference the patch please?  I'll note your mails have the tendency 
>> to
>> end up in my spam folder (which is auto-purged after some time).  Probably
>> a setup issue at nvidias side.
> 
> Found it.  Your mails fail both DKIM and DMARC so gmail thinks you are
> phishing me.

Thanks for the review. Sorry about that, I think Mark had raised a BZ issue 
somewhere tracking this, Mark do you recall something like that?
I’m afraid I don’t know much about email workings to address this, but if 
there’s more of a writeup on the issue I can forward it to someone
internally who can help…

Kyrill

> 
> Richard.
> 
>> 
>> Richard.
>> 
>>> Thanks,
>>> Kyrill
>>> 
>>>> On 24 Mar 2025, at 14:28, Kyrylo Tkachov <ktkac...@nvidia.com> wrote:
>>>> 
>>>> Hi all,
>>>> 
>>>> In this testcase GCC tries to expand a VNx4BI vector:
>>>> vector(4) <signed-boolean:4> _40;
>>>> _39 = (<signed-boolean:4>) _24;
>>>> _40 = {_39, _39, _39, _39};
>>>> 
>>>> This ends up in a scalarised sequence of bitfield insert operations.
>>>> This is despite the fact that AArch64 provides a vec_duplicate pattern
>>>> specifically for vec_duplicate into VNx4BI.
>>>> 
>>>> The store_constructor code is overly conservative when trying vec_duplicate
>>>> as it sees a requested VNx4BImode and an element mode of QImode, which I 
>>>> guess
>>>> is the storage mode of BImode objects.
>>>> 
>>>> The vec_duplicate expander in aarch64-sve.md explicitly allows QImode 
>>>> element
>>>> modes so it should be safe to use it. This patch extends that mode check
>>>> to allow such expanders.
>>>> 
>>>> The testcase is heavily auto-reduced from a real application but in itself 
>>>> is
>>>> nonsensical, but it does demonstrate the current problematic codegen.
>>>> 
>>>> This the testcase goes from:
>>>> pfalse p15.b
>>>> str p15, [sp, #6, mul vl]
>>>> mov w0, 0
>>>> ldr w2, [sp, 12]
>>>> bfi w2, w0, 0, 4
>>>> uxtw x2, w2
>>>> bfi w2, w0, 4, 4
>>>> uxtw x2, w2
>>>> bfi w2, w0, 8, 4
>>>> uxtw x2, w2
>>>> bfi w2, w0, 12, 4
>>>> str w2, [sp, 12]
>>>> ldr p15, [sp, #6, mul vl]
>>>> 
>>>> into:
>>>> whilelo p15.s, wzr, wzr
>>>> 
>>>> The whilelo could be optimised away into a pfalse of course, but the 
>>>> important
>>>> part is that the bfis are gone.
>>>> 
>>>> Bootstrapped and tested on aarch64-none-linux-gnu.
>>>> 
>>>> Given this a regression from GCC 13 is this ok for trunk now?
>>>> Thanks,
>>>> Kyrill
>>>> 
>>>> Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com>
>>>> 
>>>> gcc/
>>>> 
>>>> PR middle-end/119442
>>>> * expr.cc (store_constructor): Also allow element modes explicitly
>>>> accepted by target vec_duplicate pattern.
>>>> 
>>>> gcc/testsuite/
>>>> 
>>>> PR middle-end/119442
>>>> * gcc.target/aarch64/vls_sve_vec_dup_1.c: New test.
>>>> 
>>>> <0001-PR-middle-end-119442-expr.cc-Fix-vec_duplicate-into-.patch>
>>> 

Reply via email to