On Wed, Sep 4, 2019 at 12:50 AM Uros Bizjak <ubiz...@gmail.com> wrote:
>
> On Tue, Sep 3, 2019 at 1:33 PM Richard Biener
> <richard.guent...@gmail.com> wrote:
>
> > > > Note:
> > > > Removing limit of cost would introduce lots of regressions in SPEC2017 
> > > > as follow
> > > > --------------------------------
> > > > 531.deepsjeng_r  -7.18%
> > > > 548.exchange_r  -6.70%
> > > > 557.xz_r -6.74%
> > > > 508.namd_r -2.81%
> > > > 527.cam4_r -6.48%
> > > > 544.nab_r -3.99%
> > > >
> > > > Tested on skylake server.
> > > > -------------------------------------
> > > > How about  changing cost from 2 to 8 until we figure out a better 
> > > > number.
> > >
> > > Certainly works for me.  Note the STV code uses the "other" sse_to_integer
> > > number and the numbers in question here are those for the RA.  There's
> > > a multitude of values used in the tables here, including some a lot 
> > > larger.
> > > So the overall bumping to 8 certainly was the wrong thing to do and 
> > > instead
> > > individual numbers should have been adjusted (didn't look at the history
> > > of that bumping).
> >
> > For reference:
> >
> > r125951 | uros | 2007-06-22 19:51:06 +0200 (Fri, 22 Jun 2007) | 6 lines
> >
> >     PR target/32413
> >     * config/i386/i386.c (ix86_register_move_cost): Rise the cost of
> >     moves between MMX/SSE registers to at least 8 units to prevent
> >     ICE caused by non-tieable SI/HI/QImodes in SSE registers.
> >
> > should probably have been "twice the cost of X" or something like that
> > instead where X be some reg-reg move cost.
>
> Thanks for the reference. It looks that the patch fixes the issue in
> the wrong place, this should be solved in
> inline_secondary_memory_needed:
>
>       /* Between SSE and general, we have moves no larger than word size.  */
>       if (!(INTEGER_CLASS_P (class1) || INTEGER_CLASS_P (class2))
>            || GET_MODE_SIZE (mode) < GET_MODE_SIZE (SImode)
>            || GET_MODE_SIZE (mode) > UNITS_PER_WORD)
>         return true;
>
> as an alternative to implement QI and HImode moves as a SImode move
> between SSE and int<->SSE registers. We have
> ix86_secondary_memory_needed_mode that extends QI and HImode secondary
> memory to SImode, so this should solve PR32413.
>
> Other than that, what to do with the bizzare property of direct moves
> that benchmark far worse than indirect moves? I was expecting that
> keeping the cost of direct inter-regset moves just a bit below the
> cost of int<->mem<->xmm, but (much ?) higher than itra-regset moves
> would prevent unwanted wandering of values between register sets,
> while still generating the direct move when needed. While this almost

I've not tested it yet.
So i'll start a test about this patch(change cost from 2-->6) with
Richard's change.
I'll keep you informed when finishing test.

> fixes the runtime regression, it is not clear to me from Hongtao Liu's
> message if  Richard's 2019-08-27 fixes the remaining regression or
> not). Liu, can you please clarify?
>
--------------------------------
531.deepsjeng_r  -7.18%
548.exchange_r  -6.70%
557.xz_r -6.74%
508.namd_r -2.81%
527.cam4_r -6.48%
544.nab_r -3.99%

Tested on skylake server.
-------------------------------------
Those regressions are comparing gcc10_20190830 to gcc10_20190824 which
are mainly caused by removing limit of 8.

> > >  For example Pentium4 has quite high bases for move
> > > costs, like xmm <-> xmm move costing 12 and SSE->integer costing 20
> > > while the opposite 12.
> > >
> > > So yes, we want to revert the patch by applying its effect to the
> > > individual cost tables so we can revisit this for the still interesting
> > > micro-architectures.
>
> Uros.



-- 
BR,
Hongtao

Reply via email to