https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298
--- Comment #17 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #14)
>
> Counting latencies, I think vinserti64x2 is 1 cycle and vpinst is
> integer->sse move that is slower and set to 4 cycles.
> Overall it is wrong that we use addss cost to estimate vec_construct:
>
> case vec_construct:
> {
> int n = TYPE_VECTOR_SUBPARTS (vectype);
> /* N - 1 element inserts into an SSE vector, the possible
> GPR -> XMM move is accounted for in add_stmt_cost. */
> if (GET_MODE_BITSIZE (mode) <= 128)
> return (n - 1) * ix86_cost->sse_op;
> /* One vinserti128 for combining two SSE vectors for AVX256. */
> else if (GET_MODE_BITSIZE (mode) == 256)
> return ((n - 2) * ix86_cost->sse_op
> + ix86_vec_cost (mode, ix86_cost->addss));
> /* One vinserti64x4 and two vinserti128 for combining SSE
> and AVX256 vectors to AVX512. */
> else if (GET_MODE_BITSIZE (mode) == 512)
> return ((n - 4) * ix86_cost->sse_op
> + 3 * ix86_vec_cost (mode, ix86_cost->addss));
> gcc_unreachable ();
> }
>
> I think we may want to have ix86_cost->hard_register->integer_to_sse to cost
> the construction in integer modes instead of addss?
I have no recollection on why we are mixing sse_op and addss cost here ...
It's not a integer to SSE conversion either (again the caller adjusts
for this in this case). We seem to use sse_op for the element insert
into SSE reg and addss for the insert of SSE regs into YMM or ZMM.
I think it's reasonable to change this to consistently use sse_op.