Hi:
  The original problem was that some users wanted the cmdline option
-ffast-math not to act on intrinsic production code. .i.e for codes
like

#include<immintrin.h>
__m256d
foo2 (__m256d a, __m256d b, __m256d c, __m256d d)
{
__m256d tmp = _mm256_add_pd (a, b);
tmp = _mm256_sub_pd (tmp, c);
tmp = _mm256_sub_pd (tmp, d);
return tmp;
}

compiled with -O2 -mavx2 -ffast-math, users expected codes generated like

vaddpd ymm0, ymm0, ymm1
vsubpd ymm0, ymm0, ymm2
vsubpd ymm0, ymm0, ymm3

but not

vsubpd ymm1, ymm1, ymm2
vsubpd ymm0, ymm0, ymm3
vaddpd ymm0, ymm1, ymm0


For the LLVM side, there're mechanisms like
#pragma float_control( precise, on, push)
...(intrinsics definition)..
#pragma float_control(pop)

When intrinsics are inlined, their IRs will be marked with
"no-fast-math", and even if the caller is compiled with -ffast-math,
reassociation only happens to those IRs which are not marked with
"no-fast-math". It seems to be more flexible to support fast math
control of a region(inside a function).

Does GCC have a similar mechanism?


-- 
BR,
Hongtao

Reply via email to