On Thu, Jul 30, 2015 at 6:40 PM, Matt Turner <matts...@gmail.com> wrote: > I'd like to tell gcc that it's okay to inline functions (such as > rintf(), to get the SSE4.1 roundss instruction) at particular call > sights without compiling the entire source file or calling function > with different CFLAGS. > > I attempted this by making inline wrapper functions annotated with > attribute((optimize(...))), but it appears that the annotation does > not apply to inline functions? Take for example, ex.c: > > #include <math.h> > > static inline float __attribute__((optimize("-fno-trapping-math"))) > rintf_wrapper_inline(float x) > { > return rintf(x); > } > > float > rintf_wrapper_inline_call(float x) > { > return rintf(x); > } > > float __attribute__((optimize("-fno-trapping-math"))) > rintf_wrapper(float x) > { > return rintf(x); > } > > % gcc -O2 -msse4.1 -c ex.c > % objdump -d ex.o > > ex.o: file format elf64-x86-64 > > > Disassembly of section .text: > > 0000000000000000 <rintf_wrapper_inline_call>: > 0: e9 00 00 00 00 jmpq 5 <rintf_wrapper_inline_call+0x5> > 5: 66 66 2e 0f 1f 84 00 data32 nopw %cs:0x0(%rax,%rax,1) > c: 00 00 00 00 > > 0000000000000010 <rintf_wrapper>: > 10: 66 0f 3a 0a c0 04 roundss $0x4,%xmm0,%xmm0 > 16: c3 retq > > whereas I expected that rintf_wrapper_inline_call would be the same as > rintf_wrapper. > > I've read that per-function optimization is broken [1]. Is this still > the case? Is there a way to accomplish what I want?
Not in this way. Once rintf would be inlined the no-trapping-math flag would be gone. The only way is to use SSE intrinsics directly here or have the optimized variant not inlined. Richard. > [1] https://gcc.gnu.org/ml/gcc/2012-07/msg00201.html