On Mon, Mar 31, 2025 at 9:41 AM Richard Biener
<richard.guent...@gmail.com> wrote:
>
> On Mon, Mar 31, 2025 at 9:36 AM Kyrylo Tkachov <ktkac...@nvidia.com> wrote:
> >
> > Ping.
>
> Can you reference the patch please?  I'll note your mails have the tendency to
> end up in my spam folder (which is auto-purged after some time).  Probably
> a setup issue at nvidias side.

Found it.  Your mails fail both DKIM and DMARC so gmail thinks you are
phishing me.

Richard.

>
> Richard.
>
> > Thanks,
> > Kyrill
> >
> > > On 24 Mar 2025, at 14:28, Kyrylo Tkachov <ktkac...@nvidia.com> wrote:
> > >
> > > Hi all,
> > >
> > > In this testcase GCC tries to expand a VNx4BI vector:
> > > vector(4) <signed-boolean:4> _40;
> > > _39 = (<signed-boolean:4>) _24;
> > > _40 = {_39, _39, _39, _39};
> > >
> > > This ends up in a scalarised sequence of bitfield insert operations.
> > > This is despite the fact that AArch64 provides a vec_duplicate pattern
> > > specifically for vec_duplicate into VNx4BI.
> > >
> > > The store_constructor code is overly conservative when trying 
> > > vec_duplicate
> > > as it sees a requested VNx4BImode and an element mode of QImode, which I 
> > > guess
> > > is the storage mode of BImode objects.
> > >
> > > The vec_duplicate expander in aarch64-sve.md explicitly allows QImode 
> > > element
> > > modes so it should be safe to use it. This patch extends that mode check
> > > to allow such expanders.
> > >
> > > The testcase is heavily auto-reduced from a real application but in 
> > > itself is
> > > nonsensical, but it does demonstrate the current problematic codegen.
> > >
> > > This the testcase goes from:
> > > pfalse p15.b
> > > str p15, [sp, #6, mul vl]
> > > mov w0, 0
> > > ldr w2, [sp, 12]
> > > bfi w2, w0, 0, 4
> > > uxtw x2, w2
> > > bfi w2, w0, 4, 4
> > > uxtw x2, w2
> > > bfi w2, w0, 8, 4
> > > uxtw x2, w2
> > > bfi w2, w0, 12, 4
> > > str w2, [sp, 12]
> > > ldr p15, [sp, #6, mul vl]
> > >
> > > into:
> > > whilelo p15.s, wzr, wzr
> > >
> > > The whilelo could be optimised away into a pfalse of course, but the 
> > > important
> > > part is that the bfis are gone.
> > >
> > > Bootstrapped and tested on aarch64-none-linux-gnu.
> > >
> > > Given this a regression from GCC 13 is this ok for trunk now?
> > > Thanks,
> > > Kyrill
> > >
> > > Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com>
> > >
> > > gcc/
> > >
> > > PR middle-end/119442
> > > * expr.cc (store_constructor): Also allow element modes explicitly
> > > accepted by target vec_duplicate pattern.
> > >
> > > gcc/testsuite/
> > >
> > > PR middle-end/119442
> > > * gcc.target/aarch64/vls_sve_vec_dup_1.c: New test.
> > >
> > > <0001-PR-middle-end-119442-expr.cc-Fix-vec_duplicate-into-.patch>
> >

Reply via email to