I'd like to tell gcc that it's okay to inline functions (such as rintf(), to get the SSE4.1 roundss instruction) at particular call sights without compiling the entire source file or calling function with different CFLAGS.
I attempted this by making inline wrapper functions annotated with attribute((optimize(...))), but it appears that the annotation does not apply to inline functions? Take for example, ex.c: #include <math.h> static inline float __attribute__((optimize("-fno-trapping-math"))) rintf_wrapper_inline(float x) { return rintf(x); } float rintf_wrapper_inline_call(float x) { return rintf(x); } float __attribute__((optimize("-fno-trapping-math"))) rintf_wrapper(float x) { return rintf(x); } % gcc -O2 -msse4.1 -c ex.c % objdump -d ex.o ex.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <rintf_wrapper_inline_call>: 0: e9 00 00 00 00 jmpq 5 <rintf_wrapper_inline_call+0x5> 5: 66 66 2e 0f 1f 84 00 data32 nopw %cs:0x0(%rax,%rax,1) c: 00 00 00 00 0000000000000010 <rintf_wrapper>: 10: 66 0f 3a 0a c0 04 roundss $0x4,%xmm0,%xmm0 16: c3 retq whereas I expected that rintf_wrapper_inline_call would be the same as rintf_wrapper. I've read that per-function optimization is broken [1]. Is this still the case? Is there a way to accomplish what I want? [1] https://gcc.gnu.org/ml/gcc/2012-07/msg00201.html