Function attribute((optimize(...))) ignored on inline functions?

2015-07-30 Thread Matt Turner
I'd like to tell gcc that it's okay to inline functions (such as
rintf(), to get the SSE4.1 roundss instruction) at particular call
sights without compiling the entire source file or calling function
with different CFLAGS.

I attempted this by making inline wrapper functions annotated with
attribute((optimize(...))), but it appears that the annotation does
not apply to inline functions? Take for example, ex.c:

#include 

static inline float __attribute__((optimize("-fno-trapping-math")))
rintf_wrapper_inline(float x)
{
   return rintf(x);
}

float
rintf_wrapper_inline_call(float x)
{
   return rintf(x);
}

float __attribute__((optimize("-fno-trapping-math")))
rintf_wrapper(float x)
{
   return rintf(x);
}

% gcc -O2 -msse4.1 -c ex.c
% objdump -d ex.o

ex.o: file format elf64-x86-64


Disassembly of section .text:

 :
   0: e9 00 00 00 00   jmpq   5 
   5: 66 66 2e 0f 1f 84 00 data32 nopw %cs:0x0(%rax,%rax,1)
   c: 00 00 00 00

0010 :
  10: 66 0f 3a 0a c0 04 roundss $0x4,%xmm0,%xmm0
  16: c3   retq

whereas I expected that rintf_wrapper_inline_call would be the same as
rintf_wrapper.

I've read that per-function optimization is broken [1]. Is this still
the case? Is there a way to accomplish what I want?

[1] https://gcc.gnu.org/ml/gcc/2012-07/msg00201.html


match_scratch causing pattern mismatch

2015-07-30 Thread Paul Shortis
in a GCC port to a 16 bit cpu that uses CC flags for branching, 
I'm experimenting with using a 32 bit subtract for compare 
instead of multiple 16 bit compares and branches.


my cbranch4 expander produces a compare and conditional 
branch patterns...


  cmpmode = SELECT_CC_MODE( branchCode, op0, op1 );
  flags = gen_rtx_REG ( cmpmode, CC_REGNUM );

  compare = gen_rtx_COMPARE ( cmpmode, op0, op1 );
  emit_insn( gen_rtx_SET( VOIDmode, flags, compare ));

To implement compare using a subtract I need a HI mode scratch 
register, so I used a match_scratch


(define_insn "comparesi3"
  [ (set (reg:CC CC_REGNUM)
(compare:CC (match_operand:SI 0 "register_operand" 
"r,r")
  (match_operand:SI 1 
"rhs_operand" "r,i")))

   (clobber(match_scratch:HI 2 "=r,r"))
  ]
  ""


When I do this, the compare no longer matches and I get failures 
like this in the vregs pass...


../../../libgcc/unwind-dw2.c:1224:1: error: unrecognizable insn:
 }
 ^
(insn 69 68 70 7 (set (reg:CC 16 flags)
(compare:CC (reg:SI 44 [ D.5851 ])
(reg:SI 169))) ../../../libgcc/unwind-dw2.c:972 -1
 (nil))

when I remove the match_scratch these errors disappear, but of 
course I don't have the scratch register needed to implement the 
proper assembler instructions


I'm aware that it's the combiner that understands clobbers etc.  
So, in the .md file I tried to add a dummy comparesi3 pattern 
that doesn't have the match_scratch... after the pattern 
containing the match_scratch. This sometimes works, however on 
occasion the dummy pattern is selected by the combiner instead of 
the match_scratch pattern .


Any insight appreciated...

Cheers, Paul



Re: match_scratch causing pattern mismatch

2015-07-30 Thread Paul Shortis

Of course, the answer is to

emit_insn( gen_comparesi3( op0, op1 ));

which generates the required match_scratch

instead of ...

  cmpmode = SELECT_CC_MODE( branchCode, op0, op1 );
  flags = gen_rtx_REG ( cmpmode, CC_REGNUM );

  compare = gen_rtx_COMPARE ( cmpmode, op0, op1 );
  emit_insn( gen_rtx_SET( VOIDmode, flags, compare ));

Sorry for the bother...

On 31/07/15 08:39, Paul Shortis wrote:
in a GCC port to a 16 bit cpu that uses CC flags for branching, I'm 
experimenting with using a 32 bit subtract for compare instead of 
multiple 16 bit compares and branches.


my cbranch4 expander produces a compare and conditional branch 
patterns...


  cmpmode = SELECT_CC_MODE( branchCode, op0, op1 );
  flags = gen_rtx_REG ( cmpmode, CC_REGNUM );

  compare = gen_rtx_COMPARE ( cmpmode, op0, op1 );
  emit_insn( gen_rtx_SET( VOIDmode, flags, compare ));

To implement compare using a subtract I need a HI mode scratch 
register, so I used a match_scratch


(define_insn "comparesi3"
  [ (set (reg:CC CC_REGNUM)
(compare:CC (match_operand:SI 0 "register_operand" "r,r")
  (match_operand:SI 1 "rhs_operand" 
"r,i")))

   (clobber(match_scratch:HI 2 "=r,r"))
  ]
  ""


When I do this, the compare no longer matches and I get failures like 
this in the vregs pass...


../../../libgcc/unwind-dw2.c:1224:1: error: unrecognizable insn:
 }
 ^
(insn 69 68 70 7 (set (reg:CC 16 flags)
(compare:CC (reg:SI 44 [ D.5851 ])
(reg:SI 169))) ../../../libgcc/unwind-dw2.c:972 -1
 (nil))

when I remove the match_scratch these errors disappear, but of course 
I don't have the scratch register needed to implement the proper 
assembler instructions


I'm aware that it's the combiner that understands clobbers etc. So, in 
the .md file I tried to add a dummy comparesi3 pattern that doesn't 
have the match_scratch... after the pattern containing the 
match_scratch. This sometimes works, however on occasion the dummy 
pattern is selected by the combiner instead of the match_scratch 
pattern .


Any insight appreciated...

Cheers, Paul





Controlling instruction alternative selection

2015-07-30 Thread Paul Shortis


I'm working with a CPU having a restricted set of registers that can do 
three address maths wheres ALL registers can do two address maths.


If I define

(define_insn "addsi3"
  [ (set (match_operand:SI 0 "register_operand" "=r,r")
(plus:SI (match_operand:SI 1 "register_operand" 
"0,0")
   (match_operand:SI 2 "rhs_operand" 
"r,i")))


So that all adds are done using 2 address instructions then all is fine. 
If however I change addsi3 to


(where the constraint 'R' is the smaller set of three address registers)

(define_insn "addsi3"
  [ (set (match_operand:SI 0 "register_operand" "=R,r,r")
(plus:SI (match_operand:SI 1 "register_operand" 
"R,0,0")
   (match_operand:SI 2 "rhs_operand" 
"R,r,i")))



to take advantage of the three address instructions then the three 
address instructions are used successfully on many occassions.


However when register pressure on the 'R' class is high the allocater 
never falls back to using the entire register set by employing the two 
address instructions.


Resulting in ...

error: unable to find a register to spill in class ‘GP_REGS’

enabling lra and inspecting the rtl dump indicates that both 
alternatives (R and r) seem to be equally appealing to the allocater so 
it chooses 'R' and fails.


GCC internals document indicates that the '0' alternates should be 
placed at the end of the alternatives list, so I'm guessing 'R' will 
always be chosen.


Using constraint disparaging (?R) eradicates the errors, but of course 
that causes the 'R' three address alternative to never be used.


Suggestions ?