On 05/15/2018 08:41 AM, Richard Henderson wrote:
> On 05/15/2018 06:45 AM, Alex Bennée wrote:
>>> +float64 float64_silence_nan(float64 a, float_status *status)
>>> +{
>>> +    return float64_pack_raw(parts_silence_nan(float64_unpack_raw(a), 
>>> status));
>>> +}
>>> +
>>
>> Not that I'm objecting to the rationalisation but did you look at the
>> code generated now we unpack NaNs? I guess NaN behaviour isn't the
>> critical path for performance anyway....
> 
> Yes, I looked.  It's about 5 instructions instead of 1.
> But as you say, it's nowhere near critical path.
> 
> Ug.  I've also just realized that the shift isn't correct though...

Having fixed that and re-checked... the compiler is weird.

The float32 version optimizes to 1 insn, as we would hope.  The float16 version
optimizes to 5 insns, extracting and re-inserting the sign bit.  The float64
version optimizes to 10 insns, extracting and re-inserting the exponent as well.

Very odd.


r~

Reply via email to