> As a general comment it would be nicer if the cost metric itself would focus
> on size costs when optimizing for size and speed costs when optimizing for
> speed so that individual STV opportunities can be enabled/disabled based
> on it.

Agreed. I think the cost computation should consider insn number under
-Os, also for ABS/MIN/MAX it needs more correct model to describe the
actual insn count.
Thanks for your review.

Richard Biener <richard.guent...@gmail.com> 于2022年4月14日周四 16:56写道:
>
> On Thu, Apr 14, 2022 at 10:31 AM Hongyu Wang <wwwhhhyyy...@gmail.com> wrote:
> >
> > > >    virtual bool gate (function *)
> > >
> > > please name the parameter ...
> > >
> > > >      {
> > > >        return ((!timode_p || TARGET_64BIT)
> > > > -       && TARGET_STV && TARGET_SSE2 && optimize > 1);
> > > > +       && TARGET_STV && TARGET_SSE2 && optimize > 1
> > > > +       && optimize_function_for_speed_p (cfun));
> > >
> > > ... and use it here instead of referencing 'cfun'
> >
> > Updated. Thanks!
>
> As a general comment it would be nicer if the cost metric itself would focus
> on size costs when optimizing for size and speed costs when optimizing for
> speed so that individual STV opportunities can be enabled/disabled based
> on it.
>
> At least I see the chance that there will be a case where STV improves
> code size that will regress if we simply disable it for -Os.  Like when I do
>
> typedef int v4si __attribute__((vector_size(16)));
>
> #define min(a,b) ((a)<(b)?(a):(b))
>
> v4si foo (v4si a, v4si b)
> {
>   a[0] = min (a[0], b[0]);
>   return a;
> }
>
> there's a xmm to grp move penalty for scalar code that could go away
> (but oddly enough we're not arranging for the use of pminsd here - seems
> we're confused about vec_select/vec_merge).
>
> Richard.
>
> > gcc/ChangeLog:
> >
> > PR target/105034
> > * config/i386/i386-features.cc (pass_stv::gate()): Name param
> > to fun and add optimize_function_for_speed_p (fun).
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/105034
> > * gcc.target/i386/pr105034.c: New test.
> > ---
> >  gcc/config/i386/i386-features.cc         |  5 +++--
> >  gcc/testsuite/gcc.target/i386/pr105034.c | 23 +++++++++++++++++++++++
> >  2 files changed, 26 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105034.c
> >
> > diff --git a/gcc/config/i386/i386-features.cc 
> > b/gcc/config/i386/i386-features.cc
> > index 6fe41c3c24f..26be2986486 100644
> > --- a/gcc/config/i386/i386-features.cc
> > +++ b/gcc/config/i386/i386-features.cc
> > @@ -1908,10 +1908,11 @@ public:
> >    {}
> >
> >    /* opt_pass methods: */
> > -  virtual bool gate (function *)
> > +  virtual bool gate (function *fun)
> >      {
> >        return ((!timode_p || TARGET_64BIT)
> > -      && TARGET_STV && TARGET_SSE2 && optimize > 1);
> > +      && TARGET_STV && TARGET_SSE2 && optimize > 1
> > +      && optimize_function_for_speed_p (fun));
> >      }
> >
> >    virtual unsigned int execute (function *)
> > diff --git a/gcc/testsuite/gcc.target/i386/pr105034.c
> > b/gcc/testsuite/gcc.target/i386/pr105034.c
> > new file mode 100644
> > index 00000000000..d997e26e9ed
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr105034.c
> > @@ -0,0 +1,23 @@
> > +/* PR target/105034 */
> > +/* { dg-do compile } */
> > +/* { dg-options "-Os -msse4.1" } */
> > +
> > +#define max(a,b) (((a) > (b))? (a) : (b))
> > +#define min(a,b) (((a) < (b))? (a) : (b))
> > +
> > +int foo(int x)
> > +{
> > +  return max(x,0);
> > +}
> > +
> > +int bar(int x)
> > +{
> > +  return min(x,0);
> > +}
> > +
> > +unsigned int baz(unsigned int x)
> > +{
> > +  return min(x,1);
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "xmm" } } */
> > --
> > 2.18.1
> >
> >
> > Richard Biener <richard.guent...@gmail.com> 于2022年4月14日周四 16:06写道:
> > >
> > > On Thu, Apr 14, 2022 at 9:55 AM Hongyu Wang <wwwhhhyyy...@gmail.com> 
> > > wrote:
> > > >
> > > > >
> > > > > optimize_function_for_speed ()?
> > > > >
> > > >
> > > > Yes, updated patch with optimize_function_for_speed_p()
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > PR target/105034
> > > > * config/i386/i386-features.cc (pass_stv::gate()): Add
> > > >   optimize_function_for_speed_p ().
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > PR target/105034
> > > > * gcc.target/i386/pr105034.c: New test.
> > > > ---
> > > >  gcc/config/i386/i386-features.cc         |  3 ++-
> > > >  gcc/testsuite/gcc.target/i386/pr105034.c | 23 +++++++++++++++++++++++
> > > >  2 files changed, 25 insertions(+), 1 deletion(-)
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105034.c
> > > >
> > > > diff --git a/gcc/config/i386/i386-features.cc 
> > > > b/gcc/config/i386/i386-features.cc
> > > > index 6fe41c3c24f..a49c3aa1525 100644
> > > > --- a/gcc/config/i386/i386-features.cc
> > > > +++ b/gcc/config/i386/i386-features.cc
> > > > @@ -1911,7 +1911,8 @@ public:
> > > >    virtual bool gate (function *)
> > >
> > > please name the parameter ...
> > >
> > > >      {
> > > >        return ((!timode_p || TARGET_64BIT)
> > > > -       && TARGET_STV && TARGET_SSE2 && optimize > 1);
> > > > +       && TARGET_STV && TARGET_SSE2 && optimize > 1
> > > > +       && optimize_function_for_speed_p (cfun));
> > >
> > > ... and use it here instead of referencing 'cfun'
> > >
> > > Richard.
> > >
> > > >      }
> > > >
> > > >    virtual unsigned int execute (function *)
> > > > diff --git a/gcc/testsuite/gcc.target/i386/pr105034.c
> > > > b/gcc/testsuite/gcc.target/i386/pr105034.c
> > > > new file mode 100644
> > > > index 00000000000..d997e26e9ed
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/pr105034.c
> > > > @@ -0,0 +1,23 @@
> > > > +/* PR target/105034 */
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-Os -msse4.1" } */
> > > > +
> > > > +#define max(a,b) (((a) > (b))? (a) : (b))
> > > > +#define min(a,b) (((a) < (b))? (a) : (b))
> > > > +
> > > > +int foo(int x)
> > > > +{
> > > > +  return max(x,0);
> > > > +}
> > > > +
> > > > +int bar(int x)
> > > > +{
> > > > +  return min(x,0);
> > > > +}
> > > > +
> > > > +unsigned int baz(unsigned int x)
> > > > +{
> > > > +  return min(x,1);
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not "xmm" } } */
> > > > --
> > > > 2.18.1
> > > >
> > > > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> 于2022年4月14日周四 
> > > > 14:56写道:
> > > > >
> > > > > On Thu, Apr 14, 2022 at 3:18 AM Hongyu Wang via Gcc-patches
> > > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > From -Os point of view, stv converts scalar register to vector mode
> > > > > > which introduces extra reg conversion and increase instruction size.
> > > > > > Disabling stv under optimize_size would avoid such code size 
> > > > > > increment
> > > > > > and no need to touch ix86_size_cost that has not been tuned for long
> > > > > > time.
> > > > > >
> > > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,},
> > > > > >
> > > > > > Ok for master?
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > >         PR target/105034
> > > > > >         * config/i386/i386-features.cc (pass_stv::gate()): Block out
> > > > > >         optimize_size.
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > >
> > > > > >         PR target/105034
> > > > > >         * gcc.target/i386/pr105034.c: New test.
> > > > > > ---
> > > > > >  gcc/config/i386/i386-features.cc         |  3 ++-
> > > > > >  gcc/testsuite/gcc.target/i386/pr105034.c | 23 
> > > > > > +++++++++++++++++++++++
> > > > > >  2 files changed, 25 insertions(+), 1 deletion(-)
> > > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105034.c
> > > > > >
> > > > > > diff --git a/gcc/config/i386/i386-features.cc 
> > > > > > b/gcc/config/i386/i386-features.cc
> > > > > > index 6fe41c3c24f..f57281e672f 100644
> > > > > > --- a/gcc/config/i386/i386-features.cc
> > > > > > +++ b/gcc/config/i386/i386-features.cc
> > > > > > @@ -1911,7 +1911,8 @@ public:
> > > > > >    virtual bool gate (function *)
> > > > > >      {
> > > > > >        return ((!timode_p || TARGET_64BIT)
> > > > > > -             && TARGET_STV && TARGET_SSE2 && optimize > 1);
> > > > > > +             && TARGET_STV && TARGET_SSE2 && optimize > 1
> > > > > > +             && !optimize_size);
> > > > >
> > > > > optimize_function_for_speed ()?
> > > > >
> > > > > >      }
> > > > > >
> > > > > >    virtual unsigned int execute (function *)
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/pr105034.c 
> > > > > > b/gcc/testsuite/gcc.target/i386/pr105034.c
> > > > > > new file mode 100644
> > > > > > index 00000000000..d997e26e9ed
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/pr105034.c
> > > > > > @@ -0,0 +1,23 @@
> > > > > > +/* PR target/105034 */
> > > > > > +/* { dg-do compile } */
> > > > > > +/* { dg-options "-Os -msse4.1" } */
> > > > > > +
> > > > > > +#define max(a,b) (((a) > (b))? (a) : (b))
> > > > > > +#define min(a,b) (((a) < (b))? (a) : (b))
> > > > > > +
> > > > > > +int foo(int x)
> > > > > > +{
> > > > > > +  return max(x,0);
> > > > > > +}
> > > > > > +
> > > > > > +int bar(int x)
> > > > > > +{
> > > > > > +  return min(x,0);
> > > > > > +}
> > > > > > +
> > > > > > +unsigned int baz(unsigned int x)
> > > > > > +{
> > > > > > +  return min(x,1);
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-assembler-not "xmm" } } */
> > > > > > --
> > > > > > 2.18.1
> > > > > >

Reply via email to