On Tue, 14 Jan 2025, Christoph Müllner wrote: > On Tue, Jan 14, 2025 at 1:46 PM Richard Biener <rguent...@suse.de> wrote: > > > > On Tue, 14 Jan 2025, Christoph Müllner wrote: > > > > > As reported in PR117079, commit ab18785840d7b8 broke the test pr105493.c. > > > When looking at the generated code, we can see that the generated code > > > is vectorized differently, resulting in a reduction from 225 instructions > > > down to 109. On the performance side, no changes were measured on a 5950X. > > > > > > This patch adjusts the test condition to fit how the function gets > > > vectorized after ab18785840d7b8 (and probably further related changes). > > > > Can you adjust the comment? We didn't vectorize the first loop in GCC 14 > > but now we do. We should probably check that, dumping -vect-details > > and checking for both loops being vectorized. > > > > The slp1 scan should then be changed to scan there are no stores to tmp > > left, like with maybe > > > > scan-tree-dump-not "MEM.*tmp.* = " "slp1" > > Thanks for this idea! > The exact proposal won't work because of the "tmp" (we had before > "MEM[(uint8_t *)pix1_90(D) + 4B];" > and similar). The significant difference is that we don't have "MEM(" > anymore, but "MEM <vector". > So this works: > > /* All loops should be vectorized. */ > /* { dg-final { scan-tree-dump-not "MEM\\\[\\\(" "slp1" } } */ > /* { dg-final { scan-tree-dump "MEM <vector\\\(8\\\) unsigned char> > \\\[\[\^\]\]\*\\\]" "slp1" } } */
But this isn't a very meaningful test? As the comment suggested we now eliminate all loads + stores to 'tmp' as we vectorize both the first loop and the final reduction. In the dump I'm looking at there isn't any MEM accessing 'tmp' anymore, so we should be able to assert this? Richard. > > > > OK with those changes. > > > > Richard. > > > > > Signed-off-by: Christoph Müllner <christoph.muell...@vrull.eu> > > > > > > PR target/117079 > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.target/i386/pr105493.c: Fix expected vectorization > > > --- > > > gcc/testsuite/gcc.target/i386/pr105493.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/gcc/testsuite/gcc.target/i386/pr105493.c > > > b/gcc/testsuite/gcc.target/i386/pr105493.c > > > index c6fd16753cd9..fb0bf8aa0af7 100644 > > > --- a/gcc/testsuite/gcc.target/i386/pr105493.c > > > +++ b/gcc/testsuite/gcc.target/i386/pr105493.c > > > @@ -48,4 +48,4 @@ foo ( uint8_t *pix1, int i_pix1, uint8_t *pix2, int > > > i_pix2 ) > > > > > > /* The first loop should be vectorized, which will eliminate redundant > > > stores > > > and loads. */ > > > -/* { dg-final { scan-tree-dump-times " MEM <vector\\\(4\\\) unsigned > > > int> \\\[\[\^\]\]\*\\\] = " 4 "slp1" } } */ > > > +/* { dg-final { scan-tree-dump "= MEM <vector\\\(8\\\) unsigned char> > > > \\\[\[\^\]\]\*\\\]" "slp1" } } */ > > > > > > > -- > > Richard Biener <rguent...@suse.de> > > SUSE Software Solutions Germany GmbH, > > Frankenstrasse 146, 90461 Nuernberg, Germany; > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg) > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)