On Wed, Mar 4, 2026 at 3:34 AM Trevor Gross <[email protected]> wrote:
>
> Hello all,
>
> I am interested in revisiting the return ABI of _Float16 on i386.
> Currently it is returned in xmm0, meaning SSE is required for the type.
> This is rather inconvenient when _Float16 is otherwise quite well
> supported. Compilers need to pick between hacking together a custom ABI
> that works on the baseline, or passing the burden on to users to gate
> everything.
>
> Is there any interest in adjusting the specification such that _Float16
> is returned in a GPR rather than SSE?

Changing ABIs at anytime is wrong.
Why not change rust to follow the ABI?
Why not have fp16 as a conditionally supported feature in rust like
any other language?
Changing the ABI requires multilib or a flag day. And I doubt distros
want either of those at this stage; especially for x86 32bit which had
a stable ABI for the last 20+ years.

Thanks,
Andrew


>
> This was brought up before in the thread at [1], with the concern about
> efficient 16-bit moves between GPRs or memory and XMM. This doesn't seem
> to be relevant, however, given there isn't any reason to have a _Float16
> in XMM unless F16C is available, implying SSE2 and SSE4.1 for PINSRW and
> PEXTRW to/from memory (unless I am missing something?).
>
> A sample patch to the psABI is below. Needless to say there are
> compatibility concerns that come from a change but given workarounds
> already exist (e.g. in LLVM), it seems worth considering whether
> something should be codefied to make this simpler for everyone.
>
> Best regards,
> Trevor
>
> [1]: 
> https://inbox.sourceware.org/gcc-patches/[email protected]/
>
> (some CCs added from the linked discussion)
>
> --- patch follows ---
>
> From 1af72db89f9a10b93569fa0b9f64f65f2dd73334 Mon Sep 17 00:00:00 2001
> From: Trevor Gross <[email protected]>
> Date: Fri, 23 Jan 2026 21:11:43 +0000
> Subject: [PATCH] Return _Float16 and _Complex _Float16 in GPRs
>
> Currently the ABI specifies that _Float16 is to be passed on the stack
> and returned in xmm0, meaning SSE is required to support the type.
> Adjust both _Float16 and _Complex _Float16 to return in eax, dropping
> the SSE requirement.
>
> This has the benefit of making _Float16 ABI-compatible with `short`.
> ---
>  low-level-sys-info.tex | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/low-level-sys-info.tex b/low-level-sys-info.tex
> index 0015c8c..a2d8d6d 100644
> --- a/low-level-sys-info.tex
> +++ b/low-level-sys-info.tex
> @@ -384,8 +384,7 @@ of some 64bit return types & No \\
>  \ESI & callee-saved register & yes \\
>  \EDI & callee-saved register & yes \\
>  \reg{xmm0} & scratch register; also used to pass the first \code{__m128}
> -             parameter and return \code{__m128}, \code{_Float16},
> -            \code{_Complex _Float16} & No \\
> +             parameter and return \code{__m128} & No \\
>  \reg{ymm0} & scratch register; also used to pass the first \code{__m256}
>               parameter and return \code{__m256} & No \\
>  \reg{zmm0} & scratch register; also used to pass the first \code{__m512}
> @@ -472,7 +471,11 @@ and \texttt{unions}) are always returned in memory.
>      & \texttt{\textit{any-type} *} & \EAX \\
>      & \texttt{\textit{any-type} (*)()} & \\
>      \hline
> -    & \texttt{_Float16} & \reg{xmm0} \\
> +    & \texttt{_Float16} & \reg{ax} \\
> +    & & The upper 16 bits of \EAX are undefined.
> +    The caller must not \\
> +    & & rely on these being set in a predefined
> +    way by the called function. \\
>      \cline{2-3}
>      & \texttt{float} & \reg{st0} \\
>      \cline{2-3}
> @@ -484,7 +487,7 @@ and \texttt{unions}) are always returned in memory.
>      \cline{2-3}
>      & \texttt{__float128} & memory \\
>      \hline
> -    & \texttt{_Complex _Float16} & \reg{xmm0} \\
> +    & \texttt{_Complex _Float16} & \reg{eax} \\
>      & & The real part is returned in bits 0..15. The imaginary part is
>          returned \\
>      & & in bits 16..31.\\
> --
> 2.50.1 (Apple Git-155)

Reply via email to