The subreg pass has this : (insn 5 2 6 2 ex1b.c:8 (set (reg/f:DI 74) (const:DI (plus:DI (symbol_ref:DI ("data") <var_decl 0xb7d35058 data>) (const_int 8 [0x8])))) 71 {movdi_internal} (nil))
(insn 6 5 7 2 ex1b.c:8 (set (reg/f:DI 75) (symbol_ref:DI ("data") <var_decl 0xb7d35058 data>)) 71 {movdi_internal} (nil)) ... (insn 10 9 11 2 ex1b.c:8 (set (reg/f:DI 79) (const:DI (plus:DI (symbol_ref:DI ("data") <var_decl 0xb7d35058 data>) (const_int 16 [0x10])))) 71 {movdi_internal} (nil)) As we can see, all three are using the symbol_ref data before adding their offset. But after cse, we get this: (insn 5 2 6 2 ex1b.c:8 (set (reg/f:DI 74) (const:DI (plus:DI (symbol_ref:DI ("data") <var_decl 0xb7d35058 data>) (const_int 8 [0x8])))) 71 {movdi_internal} (nil)) (insn 6 5 7 2 ex1b.c:8 (set (reg/f:DI 75) (symbol_ref:DI ("data") <var_decl 0xb7d35058 data>)) 71 {movdi_internal} (nil)) ... (insn 10 9 11 2 ex1b.c:8 (set (reg/f:DI 79) (plus:DI (reg/f:DI 75) (const_int 16 [0x10]))) 2 {adddi3_port} (expr_list:REG_EQUAL (const:DI (plus:DI (symbol_ref:DI ("data") <var_decl 0 (const_int 16 [0x10]))) (nil))) As we can see, the CSE pass, instead of putting the three in function of 74, puts only the last one in function of 75. I put the whole dump of cse at the end of this email, I didn't want to make this one too long... Thanks again, Jean Christophe Beyler ------------------ Dump of cse1 ------------------ ;; Function foo (foo) 3 basic blocks, 2 edges. Basic block 0 , next 2, loop_depth 0, count 0, freq 10000, maybe hot. Predecessors: Successors: 2 [100.0%] (fallthru) Basic block 2 , prev 0, next 1, loop_depth 0, count 0, freq 10000, maybe hot. Predecessors: ENTRY [100.0%] (fallthru) Successors: EXIT [100.0%] (fallthru) Basic block 1 , prev 2, loop_depth 0, count 0, freq 10000, maybe hot. Predecessors: 2 [100.0%] (fallthru) Successors: starting the processing of deferred insns ending the processing of deferred insns df_analyze called df_worklist_dataflow_overeager:n_basic_blocks 3 n_edges 2 count 3 ( 1) foo Dataflow summary: def_info->table_size = 0, use_info->table_size = 0 ;; invalidated by call 2 [r2] 4 [r4] 5 [r5] 6 [r6] 7 [r7] 8 [r8] 9 [r9] 10 [r10] 11 [r11] 12 [r12] 13 [r13] 14 [r14] 15 [r15] 16 [r16] 17 [r17] 18 [r18] 19 [r19] 20 [r20] 21 [r21] 22 [r22] 23 [r23] 24 [r24] 25 [r25] 26 [r26] 27 [r27] 28 [r28] 29 [r29] 30 [r30] 31 [r31] 32 [r32] 33 [r33] 34 [r34] 35 [r35] 36 [r36] 37 [r37] 38 [r38] 39 [r39] 40 [r40] 41 [r41] 42 [r42] 43 [r43] 44 [r44] 45 [r45] 46 [r46] 47 [r47] 63 [r63] 64 [$rap] 65 [cc] 66 [acc] ;; hardware regs used 0 [r0] 1 [r1] 3 [r3] ;; regular block artificial uses 0 [r0] 1 [r1] 3 [r3] 62 [r62] ;; eh block artificial uses 0 [r0] 1 [r1] 3 [r3] 62 [r62] ;; entry block defs 0 [r0] 1 [r1] 3 [r3] 6 [r6] 8 [r8] 9 [r9] 10 [r10] 11 [r11] 12 [r12] 13 [r13] 14 [r14] 15 [r15] 62 [r62] 63 [r63] ;; exit block uses 1 [r1] 3 [r3] 6 [r6] 62 [r62] ;; regs ever live 6[r6] ( )->[0]->( 2 ) ;; bb 0 artificial_defs: { d-1(0){ }d-1(1){ }d-1(3){ }d-1(6){ }d-1(8){ }d-1(9){ }d-1(10){ }d-1(11){ }d-1(12){ }d-1(13){ }d-1(14){ }d-1(15){ }d-1(62){ }d-1(63){ }} ;; bb 0 artificial_uses: { } ( 0 )->[2]->( 1 ) ;; bb 2 artificial_defs: { } ;; bb 2 artificial_uses: { u-1(0){ }u-1(1){ }u-1(3){ }u-1(62){ }} ( 2 )->[1]->( ) ;; bb 1 artificial_defs: { } ;; bb 1 artificial_uses: { u-1(1){ }u-1(3){ }u-1(6){ }u-1(62){ }} Finding needed instructions: Adding insn 23 to worklist Finished finding needed instructions: processing block 2 live out = 0 [r0] 1 [r1] 3 [r3] 6 [r6] 62 [r62] Adding insn 17 to worklist Adding insn 13 to worklist Adding insn 12 to worklist Adding insn 11 to worklist Adding insn 10 to worklist Adding insn 9 to worklist Adding insn 8 to worklist Adding insn 7 to worklist Adding insn 6 to worklist Adding insn 5 to worklist df_worklist_dataflow_overeager:n_basic_blocks 3 n_edges 2 count 3 ( 1) ;; Following path with 11 sets: 2 deferring rescan insn with uid = 10. deferring rescan insn with uid = 17. try_optimize_cfg iteration 1 (note 3 0 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (note 2 3 5 2 NOTE_INSN_FUNCTION_BEG) (insn 5 2 6 2 ex1b.c:8 (set (reg/f:DI 74) (const:DI (plus:DI (symbol_ref:DI ("data") <var_decl 0xb7d35058 data>) (const_int 8 [0x8])))) 71 {movdi_internal} (nil)) (insn 6 5 7 2 ex1b.c:8 (set (reg/f:DI 75) (symbol_ref:DI ("data") <var_decl 0xb7d35058 data>)) 71 {movdi_internal} (nil)) (insn 7 6 8 2 ex1b.c:8 (set (reg:DI 77 [ data+8 ]) (mem/s:DI (reg/f:DI 74) [2 data+8 S8 A64])) 71 {movdi_internal} (nil)) (insn 8 7 9 2 ex1b.c:8 (set (reg:DI 78 [ data ]) (mem/s:DI (reg/f:DI 75) [2 data+0 S8 A64])) 71 {movdi_internal} (nil)) (insn 9 8 10 2 ex1b.c:8 (set (reg:DI 76) (plus:DI (reg:DI 77 [ data+8 ]) (reg:DI 78 [ data ]))) 2 {adddi3_port} (nil)) (insn 10 9 11 2 ex1b.c:8 (set (reg/f:DI 79) (plus:DI (reg/f:DI 75) (const_int 16 [0x10]))) 2 {adddi3_port} (expr_list:REG_EQUAL (const:DI (plus:DI (symbol_ref:DI ("data") <var_decl 0xb7d35058 data>) (const_int 16 [0x10]))) (nil))) (insn 11 10 12 2 ex1b.c:8 (set (reg:DI 80 [ data+16 ]) (mem/s:DI (reg/f:DI 79) [2 data+16 S8 A64])) 71 {movdi_internal} (nil)) (insn 12 11 13 2 ex1b.c:8 (set (reg:DI 73) (plus:DI (reg:DI 76) (reg:DI 80 [ data+16 ]))) 2 {adddi3_port} (nil)) (insn 13 12 17 2 ex1b.c:8 (set (reg:DI 72 [ <result> ]) (reg:DI 73)) 71 {movdi_internal} (nil)) (insn 17 13 23 2 ex1b.c:10 (set (reg/i:DI 6 r6) (reg:DI 73)) 71 {movdi_internal} (nil)) (insn 23 17 0 2 ex1b.c:10 (use (reg/i:DI 6 r6)) -1 (nil)) starting the processing of deferred insns rescanning insn with uid = 10. deleting insn with uid = 10. rescanning insn with uid = 17. deleting insn with uid = 17. ending the processing of deferred insns On Wed, Jul 15, 2009 at 12:25 PM, Adam Nemet<ane...@caviumnetworks.com> wrote: > Jean Christophe Beyler <jean.christophe.bey...@gmail.com> writes: >> uint64_t foo (void) >> { >> return data[0] + data[1] + data[2]; >> } >> >> And this generates : >> >> la r9,data >> la r7,data+8 >> ldd r6,0(r7) >> ldd r8,0(r9) >> ldd r7,16(r9) >> >> I'm trying to see if there is a problem with my rtx costs function >> because again, I don't understand why it would generate 2 la instead >> of using an offset of 8 and 16. > > You probably want to look at the RTL dumps. This code should have been > expanded with the correct offsets (at least that is what happens on > MIPS). I don't see how later passes would modify the code other than > removing 2 of the 3 "la rX, data" insns. > > Adam >