> As a general comment it would be nicer if the cost metric itself would focus > on size costs when optimizing for size and speed costs when optimizing for > speed so that individual STV opportunities can be enabled/disabled based > on it.
Agreed. I think the cost computation should consider insn number under -Os, also for ABS/MIN/MAX it needs more correct model to describe the actual insn count. Thanks for your review. Richard Biener <richard.guent...@gmail.com> 于2022年4月14日周四 16:56写道: > > On Thu, Apr 14, 2022 at 10:31 AM Hongyu Wang <wwwhhhyyy...@gmail.com> wrote: > > > > > > virtual bool gate (function *) > > > > > > please name the parameter ... > > > > > > > { > > > > return ((!timode_p || TARGET_64BIT) > > > > - && TARGET_STV && TARGET_SSE2 && optimize > 1); > > > > + && TARGET_STV && TARGET_SSE2 && optimize > 1 > > > > + && optimize_function_for_speed_p (cfun)); > > > > > > ... and use it here instead of referencing 'cfun' > > > > Updated. Thanks! > > As a general comment it would be nicer if the cost metric itself would focus > on size costs when optimizing for size and speed costs when optimizing for > speed so that individual STV opportunities can be enabled/disabled based > on it. > > At least I see the chance that there will be a case where STV improves > code size that will regress if we simply disable it for -Os. Like when I do > > typedef int v4si __attribute__((vector_size(16))); > > #define min(a,b) ((a)<(b)?(a):(b)) > > v4si foo (v4si a, v4si b) > { > a[0] = min (a[0], b[0]); > return a; > } > > there's a xmm to grp move penalty for scalar code that could go away > (but oddly enough we're not arranging for the use of pminsd here - seems > we're confused about vec_select/vec_merge). > > Richard. > > > gcc/ChangeLog: > > > > PR target/105034 > > * config/i386/i386-features.cc (pass_stv::gate()): Name param > > to fun and add optimize_function_for_speed_p (fun). > > > > gcc/testsuite/ChangeLog: > > > > PR target/105034 > > * gcc.target/i386/pr105034.c: New test. > > --- > > gcc/config/i386/i386-features.cc | 5 +++-- > > gcc/testsuite/gcc.target/i386/pr105034.c | 23 +++++++++++++++++++++++ > > 2 files changed, 26 insertions(+), 2 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/i386/pr105034.c > > > > diff --git a/gcc/config/i386/i386-features.cc > > b/gcc/config/i386/i386-features.cc > > index 6fe41c3c24f..26be2986486 100644 > > --- a/gcc/config/i386/i386-features.cc > > +++ b/gcc/config/i386/i386-features.cc > > @@ -1908,10 +1908,11 @@ public: > > {} > > > > /* opt_pass methods: */ > > - virtual bool gate (function *) > > + virtual bool gate (function *fun) > > { > > return ((!timode_p || TARGET_64BIT) > > - && TARGET_STV && TARGET_SSE2 && optimize > 1); > > + && TARGET_STV && TARGET_SSE2 && optimize > 1 > > + && optimize_function_for_speed_p (fun)); > > } > > > > virtual unsigned int execute (function *) > > diff --git a/gcc/testsuite/gcc.target/i386/pr105034.c > > b/gcc/testsuite/gcc.target/i386/pr105034.c > > new file mode 100644 > > index 00000000000..d997e26e9ed > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/i386/pr105034.c > > @@ -0,0 +1,23 @@ > > +/* PR target/105034 */ > > +/* { dg-do compile } */ > > +/* { dg-options "-Os -msse4.1" } */ > > + > > +#define max(a,b) (((a) > (b))? (a) : (b)) > > +#define min(a,b) (((a) < (b))? (a) : (b)) > > + > > +int foo(int x) > > +{ > > + return max(x,0); > > +} > > + > > +int bar(int x) > > +{ > > + return min(x,0); > > +} > > + > > +unsigned int baz(unsigned int x) > > +{ > > + return min(x,1); > > +} > > + > > +/* { dg-final { scan-assembler-not "xmm" } } */ > > -- > > 2.18.1 > > > > > > Richard Biener <richard.guent...@gmail.com> 于2022年4月14日周四 16:06写道: > > > > > > On Thu, Apr 14, 2022 at 9:55 AM Hongyu Wang <wwwhhhyyy...@gmail.com> > > > wrote: > > > > > > > > > > > > > > optimize_function_for_speed ()? > > > > > > > > > > > > > Yes, updated patch with optimize_function_for_speed_p() > > > > > > > > gcc/ChangeLog: > > > > > > > > PR target/105034 > > > > * config/i386/i386-features.cc (pass_stv::gate()): Add > > > > optimize_function_for_speed_p (). > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > PR target/105034 > > > > * gcc.target/i386/pr105034.c: New test. > > > > --- > > > > gcc/config/i386/i386-features.cc | 3 ++- > > > > gcc/testsuite/gcc.target/i386/pr105034.c | 23 +++++++++++++++++++++++ > > > > 2 files changed, 25 insertions(+), 1 deletion(-) > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr105034.c > > > > > > > > diff --git a/gcc/config/i386/i386-features.cc > > > > b/gcc/config/i386/i386-features.cc > > > > index 6fe41c3c24f..a49c3aa1525 100644 > > > > --- a/gcc/config/i386/i386-features.cc > > > > +++ b/gcc/config/i386/i386-features.cc > > > > @@ -1911,7 +1911,8 @@ public: > > > > virtual bool gate (function *) > > > > > > please name the parameter ... > > > > > > > { > > > > return ((!timode_p || TARGET_64BIT) > > > > - && TARGET_STV && TARGET_SSE2 && optimize > 1); > > > > + && TARGET_STV && TARGET_SSE2 && optimize > 1 > > > > + && optimize_function_for_speed_p (cfun)); > > > > > > ... and use it here instead of referencing 'cfun' > > > > > > Richard. > > > > > > > } > > > > > > > > virtual unsigned int execute (function *) > > > > diff --git a/gcc/testsuite/gcc.target/i386/pr105034.c > > > > b/gcc/testsuite/gcc.target/i386/pr105034.c > > > > new file mode 100644 > > > > index 00000000000..d997e26e9ed > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.target/i386/pr105034.c > > > > @@ -0,0 +1,23 @@ > > > > +/* PR target/105034 */ > > > > +/* { dg-do compile } */ > > > > +/* { dg-options "-Os -msse4.1" } */ > > > > + > > > > +#define max(a,b) (((a) > (b))? (a) : (b)) > > > > +#define min(a,b) (((a) < (b))? (a) : (b)) > > > > + > > > > +int foo(int x) > > > > +{ > > > > + return max(x,0); > > > > +} > > > > + > > > > +int bar(int x) > > > > +{ > > > > + return min(x,0); > > > > +} > > > > + > > > > +unsigned int baz(unsigned int x) > > > > +{ > > > > + return min(x,1); > > > > +} > > > > + > > > > +/* { dg-final { scan-assembler-not "xmm" } } */ > > > > -- > > > > 2.18.1 > > > > > > > > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> 于2022年4月14日周四 > > > > 14:56写道: > > > > > > > > > > On Thu, Apr 14, 2022 at 3:18 AM Hongyu Wang via Gcc-patches > > > > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > > > > > > > Hi, > > > > > > > > > > > > From -Os point of view, stv converts scalar register to vector mode > > > > > > which introduces extra reg conversion and increase instruction size. > > > > > > Disabling stv under optimize_size would avoid such code size > > > > > > increment > > > > > > and no need to touch ix86_size_cost that has not been tuned for long > > > > > > time. > > > > > > > > > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,}, > > > > > > > > > > > > Ok for master? > > > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > > > PR target/105034 > > > > > > * config/i386/i386-features.cc (pass_stv::gate()): Block out > > > > > > optimize_size. > > > > > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > > > > > PR target/105034 > > > > > > * gcc.target/i386/pr105034.c: New test. > > > > > > --- > > > > > > gcc/config/i386/i386-features.cc | 3 ++- > > > > > > gcc/testsuite/gcc.target/i386/pr105034.c | 23 > > > > > > +++++++++++++++++++++++ > > > > > > 2 files changed, 25 insertions(+), 1 deletion(-) > > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr105034.c > > > > > > > > > > > > diff --git a/gcc/config/i386/i386-features.cc > > > > > > b/gcc/config/i386/i386-features.cc > > > > > > index 6fe41c3c24f..f57281e672f 100644 > > > > > > --- a/gcc/config/i386/i386-features.cc > > > > > > +++ b/gcc/config/i386/i386-features.cc > > > > > > @@ -1911,7 +1911,8 @@ public: > > > > > > virtual bool gate (function *) > > > > > > { > > > > > > return ((!timode_p || TARGET_64BIT) > > > > > > - && TARGET_STV && TARGET_SSE2 && optimize > 1); > > > > > > + && TARGET_STV && TARGET_SSE2 && optimize > 1 > > > > > > + && !optimize_size); > > > > > > > > > > optimize_function_for_speed ()? > > > > > > > > > > > } > > > > > > > > > > > > virtual unsigned int execute (function *) > > > > > > diff --git a/gcc/testsuite/gcc.target/i386/pr105034.c > > > > > > b/gcc/testsuite/gcc.target/i386/pr105034.c > > > > > > new file mode 100644 > > > > > > index 00000000000..d997e26e9ed > > > > > > --- /dev/null > > > > > > +++ b/gcc/testsuite/gcc.target/i386/pr105034.c > > > > > > @@ -0,0 +1,23 @@ > > > > > > +/* PR target/105034 */ > > > > > > +/* { dg-do compile } */ > > > > > > +/* { dg-options "-Os -msse4.1" } */ > > > > > > + > > > > > > +#define max(a,b) (((a) > (b))? (a) : (b)) > > > > > > +#define min(a,b) (((a) < (b))? (a) : (b)) > > > > > > + > > > > > > +int foo(int x) > > > > > > +{ > > > > > > + return max(x,0); > > > > > > +} > > > > > > + > > > > > > +int bar(int x) > > > > > > +{ > > > > > > + return min(x,0); > > > > > > +} > > > > > > + > > > > > > +unsigned int baz(unsigned int x) > > > > > > +{ > > > > > > + return min(x,1); > > > > > > +} > > > > > > + > > > > > > +/* { dg-final { scan-assembler-not "xmm" } } */ > > > > > > -- > > > > > > 2.18.1 > > > > > >