On Wednesday, 4 March 2026 03:27:40 Pacific Standard Time Trevor Gross wrote:
> This was brought up before in the thread at [1], with the concern about
> efficient 16-bit moves between GPRs or memory and XMM. This doesn't seem
> to be relevant, however, given there isn't any reason to have a _Float16
> in XMM unless F16C is available, implying SSE2 and SSE4.1 for PINSRW and
> PEXTRW to/from memory (unless I am missing something?).

There is still a cost of transferring from one register file to another: those 
operations cost 3 cycles. That would imply efficient software that uses F16C or 
(better yet) AVX512FP16 would pay an extra 3-cycle penalty to move into a GPR 
on function return and another 3 cycles to reload it back into the SSE 
register file.

This is of course the opposite of what would happen on systems requiring 
emuation of FP16 conversions: one would pay a 3-cycle penalty to move from GPR 
to SSE on function return and another 3 cycles to move it back to make any use 
of the returned number.

So there are two questions to be answered, one of which has already been:

1) does FP16 support require SSE?

H.J. stated it does in the discussion you linked to and no one argued.

2) whom are we optimising this for: emulated conversions or HW-backed ones?

F16C was first introduced in 2013, though there are still systems without AVX 
being produced (e.g. embedded Pentium and Celeron). But they already have a 
massive performance loss by having to convert to and from FP32 in software, 
before performing even simple math like:

_Float16 f(_Float16 a, _Float16 b)
{
    return a + b;
}

So I'd argue it's not worth optimising for them, and it's far better to allow 
the best performance when one has HW-backed conversion instructions (and for 
GCC, using -mfpmath=sse).

Are you asking to reopen the "requires SSE" discussion?

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
  Principal Engineer - Intel Data Center - Platform & Sys. Eng.

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to