[Bug target/119298] [15 Regression] 538.imagick_r is faster when compiled with GCC 14.2 and -Ofast -flto -march=native than with master on Zen5 since r15-3441-g4292297a0f938f

rguenth at gcc dot gnu.org via Gcc-bugs Wed, 09 Apr 2025 23:17:09 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298


--- Comment #17 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #14)
>
> Counting latencies, I think vinserti64x2 is 1 cycle and vpinst is
> integer->sse move that is slower and set to 4 cycles.
> Overall it is wrong that we use addss cost to estimate vec_construct:
> 
>       case vec_construct:
>         {
>           int n = TYPE_VECTOR_SUBPARTS (vectype);
>           /* N - 1 element inserts into an SSE vector, the possible
>              GPR -> XMM move is accounted for in add_stmt_cost.  */
>           if (GET_MODE_BITSIZE (mode) <= 128)
>             return (n - 1) * ix86_cost->sse_op;
>           /* One vinserti128 for combining two SSE vectors for AVX256.  */
>           else if (GET_MODE_BITSIZE (mode) == 256)
>             return ((n - 2) * ix86_cost->sse_op
>                     + ix86_vec_cost (mode, ix86_cost->addss));
>           /* One vinserti64x4 and two vinserti128 for combining SSE
>              and AVX256 vectors to AVX512.  */
>           else if (GET_MODE_BITSIZE (mode) == 512)
>             return ((n - 4) * ix86_cost->sse_op
>                     + 3 * ix86_vec_cost (mode, ix86_cost->addss));
>           gcc_unreachable ();
>         }
> 
> I think we may want to have ix86_cost->hard_register->integer_to_sse to cost
> the construction in integer modes instead of addss?

I have no recollection on why we are mixing sse_op and addss cost here ...
It's not a integer to SSE conversion either (again the caller adjusts
for this in this case).  We seem to use sse_op for the element insert
into SSE reg and addss for the insert of SSE regs into YMM or ZMM.

I think it's reasonable to change this to consistently use sse_op.

[Bug target/119298] [15 Regression] 538.imagick_r is faster when compiled with GCC 14.2 and -Ofast -flto -march=native than with master on Zen5 since r15-3441-g4292297a0f938f

Reply via email to