Function attribute((optimize(...))) ignored on inline functions?
I'd like to tell gcc that it's okay to inline functions (such as rintf(), to get the SSE4.1 roundss instruction) at particular call sights without compiling the entire source file or calling function with different CFLAGS. I attempted this by making inline wrapper functions annotated with attribute((optimize(...))), but it appears that the annotation does not apply to inline functions? Take for example, ex.c: #include static inline float __attribute__((optimize("-fno-trapping-math"))) rintf_wrapper_inline(float x) { return rintf(x); } float rintf_wrapper_inline_call(float x) { return rintf(x); } float __attribute__((optimize("-fno-trapping-math"))) rintf_wrapper(float x) { return rintf(x); } % gcc -O2 -msse4.1 -c ex.c % objdump -d ex.o ex.o: file format elf64-x86-64 Disassembly of section .text: : 0: e9 00 00 00 00 jmpq 5 5: 66 66 2e 0f 1f 84 00 data32 nopw %cs:0x0(%rax,%rax,1) c: 00 00 00 00 0010 : 10: 66 0f 3a 0a c0 04 roundss $0x4,%xmm0,%xmm0 16: c3 retq whereas I expected that rintf_wrapper_inline_call would be the same as rintf_wrapper. I've read that per-function optimization is broken [1]. Is this still the case? Is there a way to accomplish what I want? [1] https://gcc.gnu.org/ml/gcc/2012-07/msg00201.html
match_scratch causing pattern mismatch
in a GCC port to a 16 bit cpu that uses CC flags for branching, I'm experimenting with using a 32 bit subtract for compare instead of multiple 16 bit compares and branches. my cbranch4 expander produces a compare and conditional branch patterns... cmpmode = SELECT_CC_MODE( branchCode, op0, op1 ); flags = gen_rtx_REG ( cmpmode, CC_REGNUM ); compare = gen_rtx_COMPARE ( cmpmode, op0, op1 ); emit_insn( gen_rtx_SET( VOIDmode, flags, compare )); To implement compare using a subtract I need a HI mode scratch register, so I used a match_scratch (define_insn "comparesi3" [ (set (reg:CC CC_REGNUM) (compare:CC (match_operand:SI 0 "register_operand" "r,r") (match_operand:SI 1 "rhs_operand" "r,i"))) (clobber(match_scratch:HI 2 "=r,r")) ] "" When I do this, the compare no longer matches and I get failures like this in the vregs pass... ../../../libgcc/unwind-dw2.c:1224:1: error: unrecognizable insn: } ^ (insn 69 68 70 7 (set (reg:CC 16 flags) (compare:CC (reg:SI 44 [ D.5851 ]) (reg:SI 169))) ../../../libgcc/unwind-dw2.c:972 -1 (nil)) when I remove the match_scratch these errors disappear, but of course I don't have the scratch register needed to implement the proper assembler instructions I'm aware that it's the combiner that understands clobbers etc. So, in the .md file I tried to add a dummy comparesi3 pattern that doesn't have the match_scratch... after the pattern containing the match_scratch. This sometimes works, however on occasion the dummy pattern is selected by the combiner instead of the match_scratch pattern . Any insight appreciated... Cheers, Paul
Re: match_scratch causing pattern mismatch
Of course, the answer is to emit_insn( gen_comparesi3( op0, op1 )); which generates the required match_scratch instead of ... cmpmode = SELECT_CC_MODE( branchCode, op0, op1 ); flags = gen_rtx_REG ( cmpmode, CC_REGNUM ); compare = gen_rtx_COMPARE ( cmpmode, op0, op1 ); emit_insn( gen_rtx_SET( VOIDmode, flags, compare )); Sorry for the bother... On 31/07/15 08:39, Paul Shortis wrote: in a GCC port to a 16 bit cpu that uses CC flags for branching, I'm experimenting with using a 32 bit subtract for compare instead of multiple 16 bit compares and branches. my cbranch4 expander produces a compare and conditional branch patterns... cmpmode = SELECT_CC_MODE( branchCode, op0, op1 ); flags = gen_rtx_REG ( cmpmode, CC_REGNUM ); compare = gen_rtx_COMPARE ( cmpmode, op0, op1 ); emit_insn( gen_rtx_SET( VOIDmode, flags, compare )); To implement compare using a subtract I need a HI mode scratch register, so I used a match_scratch (define_insn "comparesi3" [ (set (reg:CC CC_REGNUM) (compare:CC (match_operand:SI 0 "register_operand" "r,r") (match_operand:SI 1 "rhs_operand" "r,i"))) (clobber(match_scratch:HI 2 "=r,r")) ] "" When I do this, the compare no longer matches and I get failures like this in the vregs pass... ../../../libgcc/unwind-dw2.c:1224:1: error: unrecognizable insn: } ^ (insn 69 68 70 7 (set (reg:CC 16 flags) (compare:CC (reg:SI 44 [ D.5851 ]) (reg:SI 169))) ../../../libgcc/unwind-dw2.c:972 -1 (nil)) when I remove the match_scratch these errors disappear, but of course I don't have the scratch register needed to implement the proper assembler instructions I'm aware that it's the combiner that understands clobbers etc. So, in the .md file I tried to add a dummy comparesi3 pattern that doesn't have the match_scratch... after the pattern containing the match_scratch. This sometimes works, however on occasion the dummy pattern is selected by the combiner instead of the match_scratch pattern . Any insight appreciated... Cheers, Paul
Controlling instruction alternative selection
I'm working with a CPU having a restricted set of registers that can do three address maths wheres ALL registers can do two address maths. If I define (define_insn "addsi3" [ (set (match_operand:SI 0 "register_operand" "=r,r") (plus:SI (match_operand:SI 1 "register_operand" "0,0") (match_operand:SI 2 "rhs_operand" "r,i"))) So that all adds are done using 2 address instructions then all is fine. If however I change addsi3 to (where the constraint 'R' is the smaller set of three address registers) (define_insn "addsi3" [ (set (match_operand:SI 0 "register_operand" "=R,r,r") (plus:SI (match_operand:SI 1 "register_operand" "R,0,0") (match_operand:SI 2 "rhs_operand" "R,r,i"))) to take advantage of the three address instructions then the three address instructions are used successfully on many occassions. However when register pressure on the 'R' class is high the allocater never falls back to using the entire register set by employing the two address instructions. Resulting in ... error: unable to find a register to spill in class ‘GP_REGS’ enabling lra and inspecting the rtl dump indicates that both alternatives (R and r) seem to be equally appealing to the allocater so it chooses 'R' and fails. GCC internals document indicates that the '0' alternates should be placed at the end of the alternatives list, so I'm guessing 'R' will always be chosen. Using constraint disparaging (?R) eradicates the errors, but of course that causes the 'R' three address alternative to never be used. Suggestions ?