I'd like to tell gcc that it's okay to inline functions (such as
rintf(), to get the SSE4.1 roundss instruction) at particular call
sights without compiling the entire source file or calling function
with different CFLAGS.

I attempted this by making inline wrapper functions annotated with
attribute((optimize(...))), but it appears that the annotation does
not apply to inline functions? Take for example, ex.c:

#include <math.h>

static inline float __attribute__((optimize("-fno-trapping-math")))
rintf_wrapper_inline(float x)
{
   return rintf(x);
}

float
rintf_wrapper_inline_call(float x)
{
   return rintf(x);
}

float __attribute__((optimize("-fno-trapping-math")))
rintf_wrapper(float x)
{
   return rintf(x);
}

% gcc -O2 -msse4.1 -c ex.c
% objdump -d ex.o

ex.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <rintf_wrapper_inline_call>:
   0: e9 00 00 00 00       jmpq   5 <rintf_wrapper_inline_call+0x5>
   5: 66 66 2e 0f 1f 84 00 data32 nopw %cs:0x0(%rax,%rax,1)
   c: 00 00 00 00

0000000000000010 <rintf_wrapper>:
  10: 66 0f 3a 0a c0 04     roundss $0x4,%xmm0,%xmm0
  16: c3                   retq

whereas I expected that rintf_wrapper_inline_call would be the same as
rintf_wrapper.

I've read that per-function optimization is broken [1]. Is this still
the case? Is there a way to accomplish what I want?

[1] https://gcc.gnu.org/ml/gcc/2012-07/msg00201.html

Reply via email to