On Tue, Jan 14, 2025 at 1:46 PM Richard Biener <rguent...@suse.de> wrote: > > On Tue, 14 Jan 2025, Christoph Müllner wrote: > > > As reported in PR117079, commit ab18785840d7b8 broke the test pr105493.c. > > When looking at the generated code, we can see that the generated code > > is vectorized differently, resulting in a reduction from 225 instructions > > down to 109. On the performance side, no changes were measured on a 5950X. > > > > This patch adjusts the test condition to fit how the function gets > > vectorized after ab18785840d7b8 (and probably further related changes). > > Can you adjust the comment? We didn't vectorize the first loop in GCC 14 > but now we do. We should probably check that, dumping -vect-details > and checking for both loops being vectorized. > > The slp1 scan should then be changed to scan there are no stores to tmp > left, like with maybe > > scan-tree-dump-not "MEM.*tmp.* = " "slp1"
Thanks for this idea! The exact proposal won't work because of the "tmp" (we had before "MEM[(uint8_t *)pix1_90(D) + 4B];" and similar). The significant difference is that we don't have "MEM(" anymore, but "MEM <vector". So this works: /* All loops should be vectorized. */ /* { dg-final { scan-tree-dump-not "MEM\\\[\\\(" "slp1" } } */ /* { dg-final { scan-tree-dump "MEM <vector\\\(8\\\) unsigned char> \\\[\[\^\]\]\*\\\]" "slp1" } } */ > > OK with those changes. > > Richard. > > > Signed-off-by: Christoph Müllner <christoph.muell...@vrull.eu> > > > > PR target/117079 > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/pr105493.c: Fix expected vectorization > > --- > > gcc/testsuite/gcc.target/i386/pr105493.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/gcc/testsuite/gcc.target/i386/pr105493.c > > b/gcc/testsuite/gcc.target/i386/pr105493.c > > index c6fd16753cd9..fb0bf8aa0af7 100644 > > --- a/gcc/testsuite/gcc.target/i386/pr105493.c > > +++ b/gcc/testsuite/gcc.target/i386/pr105493.c > > @@ -48,4 +48,4 @@ foo ( uint8_t *pix1, int i_pix1, uint8_t *pix2, int > > i_pix2 ) > > > > /* The first loop should be vectorized, which will eliminate redundant > > stores > > and loads. */ > > -/* { dg-final { scan-tree-dump-times " MEM <vector\\\(4\\\) unsigned int> > > \\\[\[\^\]\]\*\\\] = " 4 "slp1" } } */ > > +/* { dg-final { scan-tree-dump "= MEM <vector\\\(8\\\) unsigned char> > > \\\[\[\^\]\]\*\\\]" "slp1" } } */ > > > > -- > Richard Biener <rguent...@suse.de> > SUSE Software Solutions Germany GmbH, > Frankenstrasse 146, 90461 Nuernberg, Germany; > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)