On Mon, Sep 2, 2019 at 6:23 PM Richard Biener
<richard.guent...@gmail.com> wrote:
>
> On Mon, Sep 2, 2019 at 10:13 AM Hongtao Liu <crazy...@gmail.com> wrote:
> >
> > > which is not the case with core_cost (and similar with skylake_cost):
> > >
> > >   2, 2, 4,                /* cost of moving XMM,YMM,ZMM register */
> > >   {6, 6, 6, 6, 12},            /* cost of loading SSE registers
> > >                        in 32,64,128,256 and 512-bit */
> > >   {6, 6, 6, 6, 12},            /* cost of storing SSE registers
> > >                        in 32,64,128,256 and 512-bit */
> > >   2, 2,                    /* SSE->integer and integer->SSE moves */
> > >
> > > We have the same cost of moving between integer registers (by default
> > > set to 2), between SSE registers and between integer and SSE register
> > > sets. I think that at least the cost of moves between regsets should
> > > be substantially higher, rs6000 uses 3x cost of intra-regset moves;
> > > that would translate to the value of 6. The value should be low enough
> > > to keep the cost below the value that forces move through the memory.
> > > Changing core register allocation cost of SSE <-> integer to:
> > >
> > > --cut here--
> > > Index: config/i386/x86-tune-costs.h
> > > ===================================================================
> > > --- config/i386/x86-tune-costs.h        (revision 275281)
> > > +++ config/i386/x86-tune-costs.h        (working copy)
> > > @@ -2555,7 +2555,7 @@ struct processor_costs core_cost = {
> > >                                            in 32,64,128,256 and 512-bit */
> > >    {6, 6, 6, 6, 12},                    /* cost of storing SSE registers
> > >                                            in 32,64,128,256 and 512-bit */
> > > -  2, 2,                                        /* SSE->integer and
> > > integer->SSE moves */
> > > +  6, 6,                                        /* SSE->integer and
> > > integer->SSE moves */
> > >    /* End of register allocator costs.  */
> > >    },
> > >
> > > --cut here--
> > >
> > > still produces direct move in gcc.target/i386/minmax-6.c
> > >
> > > I think that in addition to attached patch, values between 2 and 6
> > > should be considered in benchmarking. Unfortunately, without access to
> > > regressed SPEC tests, I can't analyse these changes by myself.
> > >
> > > Uros.
> >
> > Apply similar change to skylake_cost, on skylake workstation we got
> > performance like:
> > ---------------------------
> > version                                                            |
> > 548_exchange_r score
> > gcc10_20180822:                                           |   10
> > apply remove_max8                                       |   8.9
> > also apply increase integer_tofrom_sse cost |   9.69
> > -----------------------------
> > Still 3% regression which is related to _gfortran_mminloc0_4_i4 in
> > libgfortran.so.5.0.0.
> >
> > I found suspicious code as bellow, does it affect?
>
> This should be fixed after
>
> 2019-08-27  Richard Biener  <rguent...@suse.de>
>
>         * config/i386/i386-features.h
>         (general_scalar_chain::~general_scalar_chain): Add.
>         (general_scalar_chain::insns_conv): New bitmap.
>         (general_scalar_chain::n_sse_to_integer): New.
>         (general_scalar_chain::n_integer_to_sse): Likewise.
>         (general_scalar_chain::make_vector_copies): Adjust signature.
>         * config/i386/i386-features.c
>         (general_scalar_chain::general_scalar_chain): Outline,
>         initialize new members.
>         (general_scalar_chain::~general_scalar_chain): New.
>         (general_scalar_chain::mark_dual_mode_def): Record insns
>         we need to insert conversions at and count them.
>         (general_scalar_chain::compute_convert_gain): Account
>         for conversion instructions at chain boundary.
>         (general_scalar_chain::make_vector_copies): Generate a single
>         copy for a def by a specific insn.
>         (general_scalar_chain::convert_registers): First populate
>         defs_map, then make copies at out-of chain insns.
>
> where the only ???  is that we have
>
>   const int sse_to_integer;     /* cost of moving SSE register to integer.  */
>
> but not integer_to_sse.  In the hard_register sub-struct of processor_cost
Yes.
> we have both:
>
>       const int sse_to_integer; /* cost of moving SSE register to integer.  */
>       const int integer_to_sse; /* cost of moving integer register to SSE. */
>
> IMHO that we have mostly the same kind of costs two times is odd.
They are used for different purposes(one for register allocation, one
for rtx_cost).
Changing cost for register allocation shouldn't affect rtx_cost which
would be used
somewhere else.
> And the compute_convert_gain function adds up apples and oranges.
>
> > ------------------
> > modified   gcc/config/i386/i386-features.c
> > @@ -590,7 +590,7 @@ general_scalar_chain::compute_convert_gain ()
> >    if (dump_file)
> >      fprintf (dump_file, "  Instruction conversion gain: %d\n", gain);
> >
> > -  /* ???  What about integer to SSE?  */
> > +  /* ???  What about integer to SSE?  */???
> >    EXECUTE_IF_SET_IN_BITMAP (defs_conv, 0, insn_uid, bi)
> >      cost += DF_REG_DEF_COUNT (insn_uid) * ix86_cost->sse_to_integer;
> > ------------------
> > --
> > BR,
> > Hongtao



--
BR,
Hongtao

Reply via email to