-fprofile-arcs changes the structure of basic blocks
Hi, I want to use profiling information. I know there're two relevent fields in each basic block, count and frequency. I want to use frequency because the compiled program is for another architecture so it cannot run on the host. I use -fprofile-arcs. And I can see the frequency value when I debug cc1. But I happen to realize that when I add -fprofile-arcs, it change the the whole structure of basic block. I compared the vcg output files with and without the -fprofile-arcs. I found they're totally different. My question is why it is so? I want to know the profiling info, but if profiling info I get is for another different structure of basic block, it's useless to me. Where do I go wrong here? Which option is the suitable in this case? The gcc version is 3.3.3. Thanks. Regards, Timothy
Re: -fprofile-arcs changes the structure of basic blocks
Then I think I shouldn't use -fprofile-arcs. The reason why I used -fprofile-arcs is when I debugged a program without any flags, I saw the frequency was zero. When I added this flag, I saw frequency with values. I checked the frequency after life_analysis and before combine_instructions. I used FOR_EACH_BB(bb) { // some code } and checked the bb->frequency. So now the question is how I can see the frequency without any flags. The following was the small program I used to check the frequency. int foo(int i) { if (i < 2) return 2; else return 0; } int main() { int i; i = 0; if (i < 100) i = 3; else i = foo(i); return 0; } On 6/24/05, Daniel Berlin <[EMAIL PROTECTED]> wrote: > On Thu, 23 Jun 2005, Liu Haibin wrote: > > > Hi, > > > > I want to use profiling information. I know there're two relevent > > fields in each basic block, count and frequency. I want to use > > frequency because the compiled program is for another architecture so > > it cannot run on the host. > > Besides the fact that, as Zdenek hsa pointed out, this is not a useful > situation for -fprofile-arcs, ... > > > > My question is why it is so? I want to know the profiling info, but if > > profiling info I get is for another different structure of basic > > block, it's useless to me. > > > > This is because it's inserting profiling code. > > This isn't magic, it's inserting code to do the profiling, which > necessarily changes the basic blocks. > The profiling info you get is for the original set of basic blocks. > >
Re: -fprofile-arcs changes the structure of basic blocks
I found that the optimization must be on in order to see the frequency. Timothy On 6/24/05, Liu Haibin <[EMAIL PROTECTED]> wrote: > Then I think I shouldn't use -fprofile-arcs. The reason why I used > -fprofile-arcs is when I debugged a program without any flags, I saw > the frequency was zero. When I added this flag, I saw frequency with > values. > > I checked the frequency after life_analysis and before > combine_instructions. I used > > FOR_EACH_BB(bb) { > // some code > } > > and checked the bb->frequency. > > So now the question is how I can see the frequency without any flags. > The following was the small program I used to check the frequency. > > int foo(int i) > { > if (i < 2) > return 2; > else > return 0; > } > int main() > { > int i; > > i = 0; > if (i < 100) > i = 3; > else > i = foo(i); > > return 0; > } > > > > > On 6/24/05, Daniel Berlin <[EMAIL PROTECTED]> wrote: > > On Thu, 23 Jun 2005, Liu Haibin wrote: > > > > > Hi, > > > > > > I want to use profiling information. I know there're two relevent > > > fields in each basic block, count and frequency. I want to use > > > frequency because the compiled program is for another architecture so > > > it cannot run on the host. > > > > Besides the fact that, as Zdenek hsa pointed out, this is not a useful > > situation for -fprofile-arcs, ... > > > > > > My question is why it is so? I want to know the profiling info, but if > > > profiling info I get is for another different structure of basic > > > block, it's useless to me. > > > > > > > This is because it's inserting profiling code. > > > > This isn't magic, it's inserting code to do the profiling, which > > necessarily changes the basic blocks. > > The profiling info you get is for the original set of basic blocks. > > > > >
on nios2 difine_insn indirect_call
Hi, The nios2.md has a define_insn "indirect_call" (define_insn "indirect_call" [(call (mem:QI (match_operand:SI 0 "register_operand" "r")) (match_operand 1 "" "")) (clobber (reg:SI RA_REGNO))] "" "callr\\t%0" [(set_attr "type" "control")]) But I find that in test.c.26.flow2, there is such a code. (call_insn 41 37 42 1 0x101e17b0 (parallel [ (call (mem:QI (reg/f:SI 3 r3 [58]) [0 S1 A8]) (const_int 0 [0x0])) (clobber (reg:SI 31 ra)) ]) 41 {indirect_call} (insn_list 40 (insn_list 39 (nil))) (expr_list:REG_DEAD (reg:SI 4 r4) (expr_list:REG_DEAD (reg/f:SI 3 r3 [58]) (expr_list:REG_UNUSED (reg:SI 31 ra) (nil (expr_list (use (reg:SI 4 r4)) (nil))) Why is there a "parallel" for indirect_call in .26.flow2 but no "parallel" in define_insn indirect_call? Does it mean that the "define_insn indirect_call" in md file impilcitly has "parallel" surrounding it? Regards, Timothy
on define_peephole2
Hi, I have a problem on the define_peephole2. In nois2.md, there's such a define_insn (define_insn "addsi3" [(set (match_operand:SI 0 "register_operand" "=r,r") (plus:SI (match_operand:SI 1 "register_operand" "%r,r") (match_operand:SI 2 "arith_operand" "r,I")))] "" "add%i2\\t%0, %1, %z2" [(set_attr "type" "alu")]) I defined a peephole2 to replace this instruction. (define_peephole2 [(set (match_operand:SI 0 "register_operand" "=r") (plus:SI (match_operand:SI 1 "register_operand" "%r") ;(match_operand:SI 2 "arith_operand" "r")))] (match_operand:SI 2 "register_operand" "r")))] "" [(set (match_operand:SI 0 "register_operand" "=r") (unspec_volatile:SI [(match_operand:SI 4 "custom_insn_opcode" "N") (match_operand:SI 1 "register_operand" "r") (match_operand:SI 2 "register_operand" "r")] CUSTOM_INII))] " { operands[4] = const0_rtx; }") Because the operand 2 in the replacing instruction must be a register, I changed the "arith_operand" to "register_operand", hoping that it only replaces something like, add r1, r2, r3 instead of addi r1, r2, 9 I did a test with a file, which contains (insn/f 106 73 107 0 0x0 (set:SI (reg/f:SI 27 sp) (plus:SI (reg/f:SI 27 sp) (const_int -16 [0xfff0]))) -1 (nil) (nil)) and it seems that it did try to replace it with the new instruct. And I got the following error: isqrt.c:65: error: unrecognizable insn: (insn 123 73 107 0 0x0 (set (reg/f:SI 27 sp) (unspec_volatile:SI [ (const_int 0 [0x0]) (reg/f:SI 27 sp) (const_int -16 [0xfff0]) ] 117)) -1 (nil) (nil)) isqrt.c:65: internal compiler error: in extract_insn, at recog.c:2175 Any ideas why it still tries to replace it even when it's obviously not a register (const_int -16)? Thanks. Regards, Timothy
Re: on define_peephole2
On 7/21/05, Liu Haibin <[EMAIL PROTECTED]> wrote: > Hi, > > I have a problem on the define_peephole2. In nois2.md, there's such a > define_insn > > (define_insn "addsi3" > [(set (match_operand:SI 0 "register_operand" "=r,r") > (plus:SI (match_operand:SI 1 "register_operand" "%r,r") > (match_operand:SI 2 "arith_operand" "r,I")))] > "" > "add%i2\\t%0, %1, %z2" > [(set_attr "type" "alu")]) > > I defined a peephole2 to replace this instruction. > > (define_peephole2 > [(set (match_operand:SI 0 "register_operand" "=r") > (plus:SI (match_operand:SI 1 "register_operand" "%r") > ;(match_operand:SI 2 "arith_operand" "r")))] > (match_operand:SI 2 "register_operand" "r")))] > "" > [(set (match_operand:SI 0 "register_operand" "=r") > (unspec_volatile:SI [(match_operand:SI 4 "custom_insn_opcode" "N") my mistake. should be match_operand:SI 3 here. Now no more error. > (match_operand:SI 1 "register_operand" "r") > (match_operand:SI 2 "register_operand" "r")] > CUSTOM_INII))] > " > { > operands[4] = const0_rtx; > }") > > Because the operand 2 in the replacing instruction must be a register, > I changed the "arith_operand" to "register_operand", hoping that it > only replaces something like, add r1, r2, r3 instead of addi r1, r2, 9 > > I did a test with a file, which contains > > (insn/f 106 73 107 0 0x0 (set:SI (reg/f:SI 27 sp) > (plus:SI (reg/f:SI 27 sp) > (const_int -16 [0xfff0]))) -1 (nil) > (nil)) > > and it seems that it did try to replace it with the new instruct. And > I got the following error: > > isqrt.c:65: error: unrecognizable insn: > (insn 123 73 107 0 0x0 (set (reg/f:SI 27 sp) > (unspec_volatile:SI [ > (const_int 0 [0x0]) > (reg/f:SI 27 sp) > (const_int -16 [0xfff0]) > ] 117)) -1 (nil) > (nil)) > isqrt.c:65: internal compiler error: in extract_insn, at recog.c:2175 > > Any ideas why it still tries to replace it even when it's obviously > not a register (const_int -16)? Thanks. > > > Regards, > Timothy >
How can I create a const rtx other than 0, 1, 2
Hi, There's const0_rtx, const1_rtx and const2_rtx. How can I create a const rtx other than 0, 1, 2? I want to use it in md file, like operand[1] = 111. I know I must use const rtx here. How can I do it? A simple question, but just no idea where to find the answer. Regards, Timothy
how to write a define_peephole2 that uses custom registers in nios2
Hi, nios2 has a set of custom registers for custom instructions. They all start with "c", like custom 1 c4, c2, c0 I want to define a peephole to replace a sequence of codes with this above custom instruction. custom instruction is defined as following in nios2.md (define_insn "custom_inii" [(set (match_operand:SI 0 "register_operand" "=r") (unspec_volatile:SI [(match_operand:SI 1 "custom_insn_opcode" "N") (match_operand:SI 2 "register_operand" "r") (match_operand:SI 3 "register_operand" "r")] CUSTOM_INII))] "" "custom\\t%1, %0, %2, %3" [(set_attr "type" "custom")]) But the problem is it uses normal register, like r8, r9. How can I write the define_peephole2 so that it uses custom registers? Thanks Haibin
Re: how to write a define_peephole2 that uses custom registers in nios2
Thanks. I modified the related macros, like reg_class, REG_CLASS_FROM_LETTER(CHAR) and so on. But I have a problem on define_peephole2. After I modified the related macros, I replaced the "r" in "custom_inii" with "c". (define_insn "custom_inii" [(set (match_operand:SI 0 "register_operand" "=c") (unspec_volatile:SI [(match_operand:SI 1 "custom_insn_opcode" "N") (match_operand:SI 2 "register_operand" "c") (match_operand:SI 3 "register_operand" "c")] CUSTOM_INII))] "" "custom\\t%1, %0, %2, %3" [(set_attr "type" "custom")]) And I defined the peephole as (define_peephole2 [(set (match_operand:SI 0 "register_operand" "") (plus:SI (match_operand:SI 1 "register_operand" "") (match_operand:SI 2 "arith_operand" "")))] "REG_P(operands[2])" [(set (match_operand:SI 0 "register_operand" "=c") (unspec_volatile:SI [(match_operand:SI 3 "custom_insn_opcode" "N") (match_operand:SI 1 "register_operand" "c") (match_operand:SI 2 "register_operand" "c")] CUSTOM_INII))] " { operands[3] = GEN_INT(100); }") I encounter the following error isqrt.c: In function `usqrt': isqrt.c:65: error: insn does not satisfy its constraints: (insn 118 41 36 1 0x1002f390 (set (reg/v:SI 4 r4 [82]) (unspec_volatile:SI [ (const_int 100 [0x64]) (reg:SI 4 r4 [86]) (reg:SI 3 r3 [88]) ] 117)) 75 {custom_inii} (nil) (expr_list:REG_DEAD (reg:SI 3 r3 [88]) (nil))) isqrt.c:65: internal compiler error: in build_def_use, at regrename.c:782 Please submit a full bug report, with preprocessed source if appropriate. See http://www.altera.com/mysupport> for instructions. I think the reason it failed is there's no more define_insn "custom_inii" with general registers because I already changed the "r" to "c". However, it seems very difficult here. The old insn patterns are all general registers, but the new insn patterns are defined as custom registers. Can I use something like operands[0] = gen_rtx_REG (DImode, REGNO(operands[0])); here to force all the operands to be a different kind? Or how can I define the peephole? On 7/28/05, James E Wilson <[EMAIL PROTECTED]> wrote: > Liu Haibin wrote: > > (match_operand:SI 2 "register_operand" "r") > > But the problem is it uses normal register, like r8, r9. How can I > > write the define_peephole2 so that it uses custom registers? > > See the "Constraints" section of the documentation. "r" means a general > register. If you want a custom register, then you need to use a > contraint letter that maps to a custom register. > > If the port does not already support custom registers, then you need to > modify many of the register allocation related macros to add support for > the custom registers. See the "Registers" and "Register Classes" > sections of the documentation. > -- > Jim Wilson, GNU Tools Support, http://www.specifix.com >
some seemingly redundant register uses in nios gcc compiled assembly code
Hi, I compiled the following code using nios gcc -da -O3 (gcc version 3.3.3) #include #define PI (4*atan(1)) double rad2deg(double rad) { return (180.0 * rad / (PI)); } In .s file, it has some codes like this mov r4, zero movhi r5, %hiadj(1072693248) addir5, r5, %lo(1072693248) mov r16, r2 mov r17, r3 callatan mov r5, r3 mov r4, r2 mov r6, zero movhi r7, %hiadj(1074790400) addir7, r7, %lo(1074790400) call__muldf3 mov r10, r2 mov r5, r17 mov r6, r10 mov r7, r3 mov r4, r16 In .c.26.flow2 file, (call_insn 23 19 28 0 0x0 (parallel [ (set (reg:DF 2 r2) (call (mem:QI (symbol_ref:SI ("atan")) [0 S1 A8]) (const_int 0 [0x0]))) (clobber (reg:SI 31 ra)) ]) 44 {*call_value} (insn_list 21 (nil)) (expr_list:REG_DEAD (reg:DF 4 r4) (expr_list:REG_UNUSED (reg:SI 31 ra) (nil))) (expr_list (use (reg:DF 4 r4)) (nil))) . (call_insn/u 31 30 36 0 0x0 (parallel [ (set (reg:DF 2 r2) (call (mem:QI (symbol_ref:SI ("__muldf3")) [0 S1 A8]) (const_int 0 [0x0]))) (clobber (reg:SI 31 ra)) ]) 44 {*call_value} (insn_list 27 (insn_list 29 (nil))) (expr_list:REG_DEAD (reg:DF 4 r4) (expr_list:REG_DEAD (reg:DF 6 r6) (expr_list:REG_UNUSED (reg:SI 31 ra) (expr_list:REG_EH_REGION (const_int -1 [0x]) (nil) (expr_list (use (reg:DF 6 r6)) (expr_list (use (reg:DF 4 r4)) (nil >From the RTL we can see that these two calls don't use r5, but why here both assembly codes and rtl have some codes with r5, like movhi r5, %hiadj(1072693248) addir5, r5, %lo(1072693248)(move 32-bit constant into register) and mov r5, r3 In nios2, r2 and r3 are for return value. r4, r5, r6, r7 are for registre auruments Does the following rtl implicitly indicate that r5 is used? (expr_list (use (reg:DF 6 r6)) (expr_list (use (reg:DF 4 r4)) Thanks. Regards, Haibin
arguements used in .c.26.flow2 are not used in assembly codes
Hi, I compiled the following code using nios gcc -da -O3 (gcc version 3.3.3) #include #define PI (4*atan(1)) double rad2deg(double rad) { return (180.0 * rad / (PI)); } The begining of the .s file is rad2deg: addisp, sp, -16 stw fp, 8(sp) mov r6, zero mov fp, sp movhi r7, %hiadj(1080459264) addir7, r7, %lo(1080459264) stw ra, 12(sp) stw r16, 4(sp) stw r17, 0(sp) call__muldf3 mov r4, zero movhi r5, %hiadj(1072693248) addir5, r5, %lo(1072693248) mov r16, r2 mov r17, r3 callatan .. The corresponding rtl to "call __muldf3" in .c.26.flow2 file is (call_insn/u 17 16 21 0 0x0 (parallel [ (set (reg:DF 2 r2) (call (mem:QI (symbol_ref:SI ("__muldf3")) [0 S1 A8]) (const_int 0 [0x0]))) (clobber (reg:SI 31 ra)) ]) 44 {*call_value} (insn_list 15 (nil)) (expr_list:REG_DEAD (reg:DF 4 r4) (expr_list:REG_DEAD (reg:DF 6 r6) (expr_list:REG_UNUSED (reg:SI 31 ra) (expr_list:REG_EH_REGION (const_int -1 [0x]) (nil) (expr_list (use (reg:DF 6 r6)) (expr_list (use (reg:DF 4 r4)) (nil According to the rtl, it uses r4, r5, r6 and r7 as arguements. But the assemble codes show no r4 or r5 is ever used before "call __muldf3". Any idea why it is so? Thanks Haibin
how to add source or header file in gcc
Hi, I'd like to add some source and header files into gcc. I think I probably need to make some change in Makefile.in. But the Makefile.in looks very complicated. Could anyone give some advice on this? Regards, Haibin
on data depenence
Hi, I got a dump of sha.c.27.flow2 from gcc 3.4.1. I don't quite understand the LOG_LINKS of insn 498. LOG_LINKS in insn 498 shows that it has a data dependence (a read after write dependence) with insn 3. Why is it so? I don't see any dependence between "mov r14 r4" and "addi r3, r4, 28". The bottom is the whole dump of the basic block. (insn 3 4 11 0 (set (reg/v/f:SI 14 r14 [orig:46 sha_info ] [46]) (reg:SI 4 r4 [ sha_info ])) 8 {movsi_internal} (nil) (expr_list:REG_DEAD (reg:SI 4 r4 [ sha_info ]) (nil))) (insn 498 375 560 0 (set (reg/f:SI 3 r3 [235]) (plus:SI (reg/v/f:SI 14 r14 [orig:46 sha_info ] [46]) (const_int 28 [0x1c]))) 20 {addsi3} (insn_list 3 (nil)) (nil)) ;; Start of basic block 0, registers live: 4 [r4] 16 [r16] 17 [r17] 18 [r18] 19 [r19] 27 [sp] 31 [ra] (note 289 2 597 0 [bb 0] NOTE_INSN_BASIC_BLOCK) (insn/f 597 289 598 0 (set:SI (reg/f:SI 27 sp) (plus:SI (reg/f:SI 27 sp) (const_int -336 [0xfeb0]))) -1 (nil) (nil)) (insn/f 598 597 599 0 (set:SI (mem:SI (plus:SI (reg/f:SI 27 sp) (const_int 332 [0x14c])) [0 S4 A32]) (reg:SI 16 r16)) -1 (nil) (expr_list:REG_DEAD (reg:SI 16 r16) (nil))) (insn/f 599 598 600 0 (set:SI (mem:SI (plus:SI (reg/f:SI 27 sp) (const_int 328 [0x148])) [0 S4 A32]) (reg:SI 17 r17)) -1 (nil) (expr_list:REG_DEAD (reg:SI 17 r17) (nil))) (insn/f 600 599 601 0 (set:SI (mem:SI (plus:SI (reg/f:SI 27 sp) (const_int 324 [0x144])) [0 S4 A32]) (reg:SI 18 r18)) -1 (nil) (expr_list:REG_DEAD (reg:SI 18 r18) (nil))) (insn/f 601 600 602 0 (set:SI (mem:SI (plus:SI (reg/f:SI 27 sp) (const_int 320 [0x140])) [0 S4 A32]) (reg:SI 19 r19)) -1 (nil) (expr_list:REG_DEAD (reg:SI 19 r19) (nil))) (note 602 601 4 0 NOTE_INSN_PROLOGUE_END) (note 4 602 3 0 NOTE_INSN_FUNCTION_BEG) (insn 3 4 11 0 (set (reg/v/f:SI 14 r14 [orig:46 sha_info ] [46]) (reg:SI 4 r4 [ sha_info ])) 8 {movsi_internal} (nil) (expr_list:REG_DEAD (reg:SI 4 r4 [ sha_info ]) (nil))) (insn 11 3 375 0 (set (reg/v:SI 5 r5 [orig:47 i ] [47]) (const_int 0 [0x0])) 8 {movsi_internal} (nil) (nil)) (insn 375 11 498 0 (set (reg/s:SI 6 r6 [54]) (const_int 15 [0xf])) 8 {movsi_internal} (nil) (expr_list:REG_EQUIV (const_int 15 [0xf]) (nil))) (insn 498 375 560 0 (set (reg/f:SI 3 r3 [235]) (plus:SI (reg/v/f:SI 14 r14 [orig:46 sha_info ] [46]) (const_int 28 [0x1c]))) 20 {addsi3} (insn_list 3 (nil)) (nil)) (insn 560 498 12 0 (set (reg/f:SI 4 r4 [266]) (reg/f:SI 27 sp)) 8 {movsi_internal} (nil) (nil)) ;; End of basic block 0, registers live: 3 [r3] 4 [r4] 5 [r5] 6 [r6] 14 [r14] 27 [sp] 31 [ra] Regards, Haibin
extract register input, output and operator from rtl right before peepholes
Hi, I'd doing some coding right before peephole2 pass. I'd like to have a function that takes rtl as input and returns the values of register inputs, register output and operator. For example, input: (insn 496 34 29 1 (set (reg/f:SI 3 r3 [235]) (plus:SI (reg/f:SI 3 r3 [235]) (const_int 4 [0x4]))) 20 {addsi3} (insn_list:REG_DEP_ANTI 28 (nil)) (nil)) returns: inputs: r3, 4. ouput r3. operator: plus. I know sched_analyze() in sched-deps.c builds the dependencies in basic blocks and hope I can find some useful functions there. I roughly went through the code and didn't really understand. Because the rtl's are right before peephole2, they're much processed, which makes things easier. I hope I can find some existing function to use instead of using something like REGNO(XEXP(SET_SRC(PATTERN(x)), 0)). I believe sched-deps.c has something useful. Can someone help on this? Regards, Haibin
about REG_DEP_OUTPUT dependence
Hi, Can someone help me explain that why there's an REG_DEP_OUTPUT (write after write dependence) between jump_insn 547 and insn 82? (insn 82 543 478 3 (set (mem/s:SI (reg/f:SI 6 r6 [224]) [4 W S4 A32]) (reg:SI 2 r2 [95])) 8 {movsi_internal} (insn_list 81 (nil)) (expr_list:REG_DEAD (reg:SI 2 r2 [95]) (nil))) (insn 478 82 547 3 (set (reg/f:SI 6 r6 [224]) (plus:SI (reg/f:SI 6 r6 [224]) (const_int 4 [0x4]))) 20 {addsi3} (insn_list:REG_DEP_ANTI 65 (insn_list:REG_DEP_ANTI 66 (insn_list:REG_DEP_ANTI 73 (insn_list:REG_DEP_ANTI 80 (insn_list:REG_DEP_ANTI 82 (nil)) (nil)) (jump_insn 547 478 93 3 (set (pc) (if_then_else (ne:SI (reg/v:SI 7 r7 [orig:270 i ] [270]) (const_int 0 [0x0])) (label_ref 88) (pc))) 61 {*cbranch} (insn_list 543 (insn_list:REG_DEP_OUTPUT 82 (nil))) (expr_list:REG_BR_PROB (const_int 9844 [0x2674]) (nil))) Regards, Haibin
do -fprofile-arcs and -fbranch-probabilities help to set bb->count?
Hi, I wanted to use bb->count, so I expected that -fprofile-arcs and -fbranch-probabilities would help. I added printf just before peephole2 optimization and ran the following. $gcc -O3 -fprofile-arcs test.c -o test $./test (which produced test.gcno only, but no test.gcda) $gcc -O3 -fprofile-arcs -fbranch-probabilities test.c -o test But it turned out that all the basic block counts were 0. Any idea how I can see bb->count set. Regards, Haibin
Re: do -fprofile-arcs and -fbranch-probabilities help to set bb->count?
I did a testing on a normal PC and it worked. Thanks. However, I'm using gcc nios2 port. The executable file is downloaded to a board, which is attached to my PC. So while the executable file is running on the board, it doesn't produce any gcda file. According to Altera nios2 people, I can do some configuration, so that the executable will be able to output to Host PC through normal fputs or fwrite. So is it feasible that I can make executable to output gcda file to my Host PC via fputs or fwrite? Regards, Haibin On 2/21/06, Paolo Bonzini <[EMAIL PROTECTED]> wrote: > Liu Haibin wrote: > > Hi, > > > > I wanted to use bb->count, so I expected that -fprofile-arcs and > > -fbranch-probabilities would help. I added printf just before > > peephole2 optimization and ran the following. > > > > $gcc -O3 -fprofile-arcs test.c -o test > > $./test (which produced test.gcno only, but no test.gcda) > > $gcc -O3 -fprofile-arcs -fbranch-probabilities test.c -o test > > Easier to use -fprofile-generate and -fprofile-use: > > gcc -O3 -fprofile-generate test.c -o test (produces test.gcno) > ./test (now should have as well test.gcda) > gcc -O3 -fprofile-use test.c -o test > > Paolo >