Hi: The original problem was that some users wanted the cmdline option -ffast-math not to act on intrinsic production code. .i.e for codes like
#include<immintrin.h> __m256d foo2 (__m256d a, __m256d b, __m256d c, __m256d d) { __m256d tmp = _mm256_add_pd (a, b); tmp = _mm256_sub_pd (tmp, c); tmp = _mm256_sub_pd (tmp, d); return tmp; } compiled with -O2 -mavx2 -ffast-math, users expected codes generated like vaddpd ymm0, ymm0, ymm1 vsubpd ymm0, ymm0, ymm2 vsubpd ymm0, ymm0, ymm3 but not vsubpd ymm1, ymm1, ymm2 vsubpd ymm0, ymm0, ymm3 vaddpd ymm0, ymm1, ymm0 For the LLVM side, there're mechanisms like #pragma float_control( precise, on, push) ...(intrinsics definition).. #pragma float_control(pop) When intrinsics are inlined, their IRs will be marked with "no-fast-math", and even if the caller is compiled with -ffast-math, reassociation only happens to those IRs which are not marked with "no-fast-math". It seems to be more flexible to support fast math control of a region(inside a function). Does GCC have a similar mechanism? -- BR, Hongtao