On Thu, 1 Jul 2021, H.J. Lu via Gcc-patches wrote:

> The main issue is complex _Float16 functions in libgcc.  If _Float16 doesn't
> require -mavx512fp16, we need to compile complex _Float16 functions in
> libgcc without -mavx512fp16.  Complex _Float16 performance is very
> important for our _Float16 usage.   _Float16 performance has to be
> very fast.  There should be no emulation anywhere when -mavx512fp16
> is used.   That is why _Float16 is available only with -mavx512fp16.

You could build IFUNC versions of the libgcc functions (like float128 on 
powerpc64le), to be fast (modulo any IFUNC overhead) when run on 
AVX512FP16 hardware.  Or arrange for different libcall names to be used 
depending on the instruction set features available, and build those 
functions under multiple names, to be fast when the application is built 
with -mavx512fp16.

Since the HCmode libgcc functions just convert to/from SFmode and do all 
their computations on SFmode (to avoid intermediate overflows / 
cancellation resulting in inaccuracy), an F16C version may make sense as 
well (assuming use of the F16C conversion instructions is still efficient 
once you allow for zeroing the unused parts of the vector register, if 
necessary to avoid spurious exceptions from converting junk data in those 
parts of the register).

-- 
Joseph S. Myers
jos...@codesourcery.com

Reply via email to