On Tue, Jan 14, 2025 at 1:46 PM Richard Biener <rguent...@suse.de> wrote:
>
> On Tue, 14 Jan 2025, Christoph Müllner wrote:
>
> > As reported in PR117079, commit ab18785840d7b8 broke the test pr105493.c.
> > When looking at the generated code, we can see that the generated code
> > is vectorized differently, resulting in a reduction from 225 instructions
> > down to 109. On the performance side, no changes were measured on a 5950X.
> >
> > This patch adjusts the test condition to fit how the function gets
> > vectorized after ab18785840d7b8 (and probably further related changes).
>
> Can you adjust the comment?  We didn't vectorize the first loop in GCC 14
> but now we do.  We should probably check that, dumping -vect-details
> and checking for both loops being vectorized.
>
> The slp1 scan should then be changed to scan there are no stores to tmp
> left, like with maybe
>
>  scan-tree-dump-not "MEM.*tmp.* = " "slp1"

Thanks for this idea!
The exact proposal won't work because of the "tmp" (we had before
"MEM[(uint8_t *)pix1_90(D) + 4B];"
and similar). The significant difference is that we don't have "MEM("
anymore, but "MEM <vector".
So this works:

/* All loops should be vectorized.  */
/* { dg-final { scan-tree-dump-not "MEM\\\[\\\(" "slp1" } } */
/* { dg-final { scan-tree-dump "MEM <vector\\\(8\\\) unsigned char>
\\\[\[\^\]\]\*\\\]" "slp1" } } */

>
> OK with those changes.
>
> Richard.
>
> > Signed-off-by: Christoph Müllner <christoph.muell...@vrull.eu>
> >
> >       PR target/117079
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * gcc.target/i386/pr105493.c: Fix expected vectorization
> > ---
> >  gcc/testsuite/gcc.target/i386/pr105493.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/pr105493.c 
> > b/gcc/testsuite/gcc.target/i386/pr105493.c
> > index c6fd16753cd9..fb0bf8aa0af7 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr105493.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr105493.c
> > @@ -48,4 +48,4 @@ foo ( uint8_t *pix1, int i_pix1, uint8_t *pix2, int 
> > i_pix2 )
> >
> >  /* The first loop should be vectorized, which will eliminate redundant 
> > stores
> >     and loads.  */
> > -/* { dg-final { scan-tree-dump-times "  MEM <vector\\\(4\\\) unsigned int> 
> > \\\[\[\^\]\]\*\\\] = " 4 "slp1" } } */
> > +/* { dg-final { scan-tree-dump "= MEM <vector\\\(8\\\) unsigned char> 
> > \\\[\[\^\]\]\*\\\]" "slp1" } } */
> >
>
> --
> Richard Biener <rguent...@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to