https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96861
--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to Hongtao.liu from comment #2) > . Difference comes from /* Cost the integer to sse and sse to integer moves. */ cost += n_sse_to_integer * ix86_cost->sse_to_integer; /* ??? integer_to_sse but we only have that in the RA cost table. Assume sse_to_integer/integer_to_sse are the same which they are at the moment. */ cost += n_integer_to_sse * ix86_cost->sse_to_integer; Maybe need to increase ix86->sse_to_integer, i'm test the following. --- a/gcc/config/i386/x86-tune-costs.h +++ b/gcc/config/i386/x86-tune-costs.h @@ -1769,7 +1769,7 @@ struct processor_costs skylake_cost = { {6, 6, 6, 10, 20}, /* cost of unaligned loads. */ {8, 8, 8, 8, 16}, /* cost of unaligned stores. */ 2, 2, 4, /* cost of moving XMM,YMM,ZMM register */ - 2, /* cost of moving SSE register to integer. */ + 6, /* cost of moving SSE register to integer. */ 20, 8, /* Gather load static, per_elt. */ 22, 10, /* Gather store static, per_elt. */ 64, /* size of l1 cache. */