On 05/15/2018 08:41 AM, Richard Henderson wrote: > On 05/15/2018 06:45 AM, Alex Bennée wrote: >>> +float64 float64_silence_nan(float64 a, float_status *status) >>> +{ >>> + return float64_pack_raw(parts_silence_nan(float64_unpack_raw(a), >>> status)); >>> +} >>> + >> >> Not that I'm objecting to the rationalisation but did you look at the >> code generated now we unpack NaNs? I guess NaN behaviour isn't the >> critical path for performance anyway.... > > Yes, I looked. It's about 5 instructions instead of 1. > But as you say, it's nowhere near critical path. > > Ug. I've also just realized that the shift isn't correct though...
Having fixed that and re-checked... the compiler is weird. The float32 version optimizes to 1 insn, as we would hope. The float16 version optimizes to 5 insns, extracting and re-inserting the sign bit. The float64 version optimizes to 10 insns, extracting and re-inserting the exponent as well. Very odd. r~