On Mon, Sep 2, 2019 at 6:23 PM Richard Biener <richard.guent...@gmail.com> wrote: > > On Mon, Sep 2, 2019 at 10:13 AM Hongtao Liu <crazy...@gmail.com> wrote: > > > > > which is not the case with core_cost (and similar with skylake_cost): > > > > > > 2, 2, 4, /* cost of moving XMM,YMM,ZMM register */ > > > {6, 6, 6, 6, 12}, /* cost of loading SSE registers > > > in 32,64,128,256 and 512-bit */ > > > {6, 6, 6, 6, 12}, /* cost of storing SSE registers > > > in 32,64,128,256 and 512-bit */ > > > 2, 2, /* SSE->integer and integer->SSE moves */ > > > > > > We have the same cost of moving between integer registers (by default > > > set to 2), between SSE registers and between integer and SSE register > > > sets. I think that at least the cost of moves between regsets should > > > be substantially higher, rs6000 uses 3x cost of intra-regset moves; > > > that would translate to the value of 6. The value should be low enough > > > to keep the cost below the value that forces move through the memory. > > > Changing core register allocation cost of SSE <-> integer to: > > > > > > --cut here-- > > > Index: config/i386/x86-tune-costs.h > > > =================================================================== > > > --- config/i386/x86-tune-costs.h (revision 275281) > > > +++ config/i386/x86-tune-costs.h (working copy) > > > @@ -2555,7 +2555,7 @@ struct processor_costs core_cost = { > > > in 32,64,128,256 and 512-bit */ > > > {6, 6, 6, 6, 12}, /* cost of storing SSE registers > > > in 32,64,128,256 and 512-bit */ > > > - 2, 2, /* SSE->integer and > > > integer->SSE moves */ > > > + 6, 6, /* SSE->integer and > > > integer->SSE moves */ > > > /* End of register allocator costs. */ > > > }, > > > > > > --cut here-- > > > > > > still produces direct move in gcc.target/i386/minmax-6.c > > > > > > I think that in addition to attached patch, values between 2 and 6 > > > should be considered in benchmarking. Unfortunately, without access to > > > regressed SPEC tests, I can't analyse these changes by myself. > > > > > > Uros. > > > > Apply similar change to skylake_cost, on skylake workstation we got > > performance like: > > --------------------------- > > version | > > 548_exchange_r score > > gcc10_20180822: | 10 > > apply remove_max8 | 8.9 > > also apply increase integer_tofrom_sse cost | 9.69 > > ----------------------------- > > Still 3% regression which is related to _gfortran_mminloc0_4_i4 in > > libgfortran.so.5.0.0. > > > > I found suspicious code as bellow, does it affect? > > This should be fixed after > > 2019-08-27 Richard Biener <rguent...@suse.de> > > * config/i386/i386-features.h > (general_scalar_chain::~general_scalar_chain): Add. > (general_scalar_chain::insns_conv): New bitmap. > (general_scalar_chain::n_sse_to_integer): New. > (general_scalar_chain::n_integer_to_sse): Likewise. > (general_scalar_chain::make_vector_copies): Adjust signature. > * config/i386/i386-features.c > (general_scalar_chain::general_scalar_chain): Outline, > initialize new members. > (general_scalar_chain::~general_scalar_chain): New. > (general_scalar_chain::mark_dual_mode_def): Record insns > we need to insert conversions at and count them. > (general_scalar_chain::compute_convert_gain): Account > for conversion instructions at chain boundary. > (general_scalar_chain::make_vector_copies): Generate a single > copy for a def by a specific insn. > (general_scalar_chain::convert_registers): First populate > defs_map, then make copies at out-of chain insns. > > where the only ??? is that we have > > const int sse_to_integer; /* cost of moving SSE register to integer. */ > > but not integer_to_sse. In the hard_register sub-struct of processor_cost Yes. > we have both: > > const int sse_to_integer; /* cost of moving SSE register to integer. */ > const int integer_to_sse; /* cost of moving integer register to SSE. */ > > IMHO that we have mostly the same kind of costs two times is odd. They are used for different purposes(one for register allocation, one for rtx_cost). Changing cost for register allocation shouldn't affect rtx_cost which would be used somewhere else. > And the compute_convert_gain function adds up apples and oranges. > > > ------------------ > > modified gcc/config/i386/i386-features.c > > @@ -590,7 +590,7 @@ general_scalar_chain::compute_convert_gain () > > if (dump_file) > > fprintf (dump_file, " Instruction conversion gain: %d\n", gain); > > > > - /* ??? What about integer to SSE? */ > > + /* ??? What about integer to SSE? */??? > > EXECUTE_IF_SET_IN_BITMAP (defs_conv, 0, insn_uid, bi) > > cost += DF_REG_DEF_COUNT (insn_uid) * ix86_cost->sse_to_integer; > > ------------------ > > -- > > BR, > > Hongtao
-- BR, Hongtao