On Thu, Apr 26, 2012 at 6:16 PM, Paulo J. Matos <pa...@matos-sorge.com> wrote: > Hi, > > I am facing a problem with the GCC47 register allocation and my movmemqi. > GCC46 dealt very well with the problem but GCC47 keeps throwing at me > register spill failures. > > My backend has very few registers. 3 chip registers in total (class > CHIP_REGS), one of them (XL) is used for memory references (class ADDR_REGS) > and the other two (AL, AH) are for normal use (DATA_REGS), so CHIP_REGS = > ADDR_REGS U DATA_REGS. > > There are a couple of other memory mapped registers, but all loads and > stores go through CHIP_REGS. > > My chip has a block copy instruction which needs source address in XL, > destination address in AH and count in AL. My movmemqi is similar to > movmemsi in rx. > > (define_expand "movmemqi" > [(use (match_operand:BLK 0 "memory_operand")) > (use (match_operand:BLK 1 "memory_operand")) > (use (match_operand:QI 2 "general_operand")) > (use (match_operand:QI 3 "general_operand"))] > "" > { > rtx dst_addr = XEXP(operands[0], 0); > rtx src_addr = XEXP(operands[1], 0); > rtx dst_reg = gen_rtx_REG(QImode, RAH); > rtx src_reg = gen_rtx_REG(QImode, RXL); > rtx cnt_reg = gen_rtx_REG(QImode, RAL); > > emit_move_insn(cnt_reg, operands[2]); > > if(GET_CODE(dst_addr) == PLUS) > { > emit_move_insn(dst_reg, XEXP(dst_addr, 0)); > emit_insn(gen_addqi3(dst_reg, dst_reg, XEXP(dst_addr, 1))); > } > else > emit_move_insn(dst_reg, dst_addr); > > if(GET_CODE(src_addr) == PLUS) > { > emit_move_insn(src_reg, XEXP(src_addr, 0)); > emit_insn(gen_addqi3(src_reg, src_reg, XEXP(src_addr, 1))); > } > else > emit_move_insn(src_reg, src_addr); > > emit_insn(gen_bc2()); > > DONE; > }) > > (define_insn "bc2" > [(set (reg:QI RAL) (const_int 0)) > (set (mem:BLK (reg:QI RAH)) (mem:BLK (reg:QI RXL))) > (set (reg:QI RXL) (plus:QI (reg:QI RXL) (reg:QI RAL))) > (set (reg:QI RAH) (plus:QI (reg:QI RAH) (reg:QI RAL)))] > "" > "bc2") > > The parallel in bc2 setups what the bc2 chip instruction modifies. Copies > block in XL to AH, Moves XL to point to the end of the source block, AH to > point to the end of the destination block and sets AL to 0. > > The C code > int ** > t25 (int *d, int **s) > { > memcpy (d, *s, 16); > return s; > } > > turns into the following after asmcons (-Os passed in): > (note 5 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) > > (insn 2 5 3 2 (parallel [ > (set (reg/v/f:QI 22 [ d ]) > (reg:QI 1 AL [ d ])) > (clobber (reg:CC 13 CC)) > ]) memcpy.i:3 6 {*movqi} > (expr_list:REG_DEAD (reg:QI 1 AL [ d ]) > (expr_list:REG_UNUSED (reg:CC 13 CC) > (nil)))) > > (insn 3 2 4 2 (parallel [ > (set (reg/v/f:QI 23 [ s ]) > (reg:QI 0 AH [ s ])) > (clobber (reg:CC 13 CC)) > ]) memcpy.i:3 6 {*movqi} > (expr_list:REG_DEAD (reg:QI 0 AH [ s ]) > (expr_list:REG_UNUSED (reg:CC 13 CC) > (nil)))) > > (note 4 3 7 2 NOTE_INSN_FUNCTION_BEG) > > (insn 7 4 8 2 (parallel [ > (set (reg/f:QI 24 [ *s_1(D) ]) > (mem/f:QI (reg/v/f:QI 23 [ s ]) [2 *s_1(D)+0 S1 A16])) > (clobber (reg:CC 13 CC)) > ]) memcpy.i:4 6 {*movqi} > (expr_list:REG_UNUSED (reg:CC 13 CC) > (nil))) > > (insn 8 7 9 2 (parallel [ > (set (reg:QI 1 AL) > (const_int 16 [0x10])) > (clobber (reg:CC 13 CC)) > ]) memcpy.i:4 6 {*movqi} > (expr_list:REG_UNUSED (reg:CC 13 CC) > (nil))) > > (insn 9 8 10 2 (parallel [ > (set (reg:QI 0 AH) > (reg/v/f:QI 22 [ d ])) > (clobber (reg:CC 13 CC)) > ]) memcpy.i:4 6 {*movqi} > (expr_list:REG_DEAD (reg/v/f:QI 22 [ d ]) > (expr_list:REG_UNUSED (reg:CC 13 CC) > (nil)))) > > (insn 10 9 11 2 (parallel [ > (set (reg:QI 3 X) > (reg/f:QI 24 [ *s_1(D) ])) > (clobber (reg:CC 13 CC)) > ]) memcpy.i:4 6 {*movqi} > (expr_list:REG_DEAD (reg/f:QI 24 [ *s_1(D) ]) > (expr_list:REG_UNUSED (reg:CC 13 CC) > (nil)))) > > (insn 11 10 16 2 (parallel [ > (set (reg:QI 1 AL) > (const_int 0 [0])) > (set (mem:BLK (reg:QI 0 AH) [0 A16]) > (mem:BLK (reg:QI 3 X) [0 A16])) > (set (reg:QI 3 X) > (plus:QI (reg:QI 3 X) > (reg:QI 1 AL))) > (set (reg:QI 0 AH) > (plus:QI (reg:QI 0 AH) > (reg:QI 1 AL))) > ]) memcpy.i:4 21 {bc2} > (expr_list:REG_UNUSED (reg:QI 3 X) > (expr_list:REG_UNUSED (reg:QI 1 AL) > (expr_list:REG_UNUSED (reg:QI 0 AH) > (nil))))) > > (insn 16 11 19 2 (parallel [ > (set (reg/i:QI 1 AL) > (reg/v/f:QI 23 [ s ])) > (clobber (reg:CC 13 CC)) > ]) memcpy.i:6 6 {*movqi} > (expr_list:REG_DEAD (reg/v/f:QI 23 [ s ]) > (expr_list:REG_UNUSED (reg:CC 13 CC) > (nil)))) > > (insn 19 16 0 2 (use (reg/i:QI 1 AL)) memcpy.i:6 -1 > (nil)) > > Pass ira starts by reporting: > ;; Function t25 (t25, funcdef_no=0, decl_uid=1309, cgraph_uid=0) > > starting the processing of deferred insns > ending the processing of deferred insns > df_analyze called > Building IRA IR > starting the processing of deferred insns > ending the processing of deferred insns > df_analyze called > init_insns for 24: (insn_list:REG_DEP_TRUE 7 (nil)) > > Pass 0 for finding pseudo/allocno costs > > a1 (r24,l0) best ADDR_REGS, allocno ADDR_REGS > a0 (r23,l0) best ADDR_REGS, allocno ADDR_REGS > a2 (r22,l0) best GENERAL_REGS, allocno GENERAL_REGS > > a0(r23,l0) costs: ADDR_REGS:0 DATA_REGS:4000 STACK_REGS:4000 CHIP_REGS:4000 > FAKE_REGS:4000 MEM_REGS:4000 GENERAL_REGS:6000 ALL_REGS:6000 MEM:13000 > a1(r24,l0) costs: ADDR_REGS:-1000 DATA_REGS:0 STACK_REGS:2000 CHIP_REGS:0 > FAKE_REGS:2000 MEM_REGS:2000 GENERAL_REGS:2000 ALL_REGS:2000 MEM:-1000 > a2(r22,l0) costs: ADDR_REGS:0 DATA_REGS:0 STACK_REGS:0 CHIP_REGS:0 > FAKE_REGS:0 MEM_REGS:0 GENERAL_REGS:2000 ALL_REGS:2000 MEM:7000 > > > Pass 1 for finding pseudo/allocno costs > > r24: preferred ADDR_REGS, alternative NO_REGS, allocno ADDR_REGS > r23: preferred ADDR_REGS, alternative GENERAL_REGS, allocno GENERAL_REGS > r22: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS > > a0(r23,l0) costs: ADDR_REGS:0 DATA_REGS:4000 STACK_REGS:6000 CHIP_REGS:4000 > FAKE_REGS:6000 MEM_REGS:6000 GENERAL_REGS:6000 ALL_REGS:6000 MEM:13000 > a1(r24,l0) costs: ADDR_REGS:-1000 DATA_REGS:0 STACK_REGS:2000 CHIP_REGS:0 > FAKE_REGS:2000 MEM_REGS:2000 GENERAL_REGS:2000 ALL_REGS:2000 MEM:-1000 > a2(r22,l0) costs: GENERAL_REGS:2000 MEM:7000 > > Insn 19(l0): point = 0 > Insn 16(l0): point = 2 > Insn 11(l0): point = 4 > Insn 10(l0): point = 6 > Insn 9(l0): point = 8 > Insn 8(l0): point = 10 > Insn 7(l0): point = 12 > Insn 3(l0): point = 14 > Insn 2(l0): point = 16 > a0(r23): [3..14] > a1(r24): [7..12] > a2(r22): [9..16] > Compressing live ranges: from 19 to 2 - 10% > Ranges after the compression: > a0(r23): [0..1] > a1(r24): [0..1] > a2(r22): [0..1] > +++Allocating 12 bytes for conflict table (uncompressed size 12) > ;; a0(r23,l0) conflicts: a1(r24,l0) a2(r22,l0) > ;; total conflict hard regs: 0 1 3 > ;; conflict hard regs: 0 1 3 > > ;; a1(r24,l0) conflicts: a0(r23,l0) a2(r22,l0) > ;; total conflict hard regs: > ;; conflict hard regs: > > ;; a2(r22,l0) conflicts: a0(r23,l0) a1(r24,l0) > ;; total conflict hard regs: 0 1 > ;; conflict hard regs: 0 1 > > regions=1, blocks=3, points=2 > allocnos=3 (big 0), copies=0, conflicts=0, ranges=3 > > **** Allocnos coloring: > > > Loop 0 (parent -1, header bb2, depth 0) > bbs: 2 > all: 0r23 1r24 2r22 > modified regnos: 22 23 24 > border: > Pressure: GENERAL_REGS=4 > Hard reg set forest: > 0:( 0 1 3 7-12)@0 > 1:( 3 7-12)@18000 > 2:( 7-12)@26000 > 3:( 3)@0 > Allocno a0r23 of GENERAL_REGS(9) has 6 avail. regs 7-12, node: 7-12 > (confl regs = 0-6 13-12) > Allocno a1r24 of ADDR_REGS(1) has 1 avail. regs 3, node: 3 (confl > regs = 0-2 4-12) > Allocno a2r22 of GENERAL_REGS(9) has 7 avail. regs 3 7-12, node: 3 > 7-12 (confl regs = 0-2 4-6 13-12) > Pushing a2(r22,l0)(cost 0) > Making a1(r24,l0) colorable > Pushing a1(r24,l0)(cost 0) > Pushing a0(r23,l0)(cost 0) > Popping a0(r23,l0) -- assign reg 7 > Popping a1(r24,l0) -- assign reg 3 > Popping a2(r22,l0) -- assign reg 8 > Disposition: > 2:r22 l0 8 0:r23 l0 7 1:r24 l0 3 > New iteration of spill/restore move > +++Costs: overall 7000, reg 7000, mem 0, ld 0, st 0, move 0 > +++ move loops 0, new jumps 0 > > > Followed by reload breaking after a spill it can't handle: > ;; Function t25 (t25, funcdef_no=0, decl_uid=1309, cgraph_uid=0) > > insn=2, live_throughout: 0, 5, dead_or_set: 1, 22 > insn=3, live_throughout: 5, 22, dead_or_set: 0, 23 > insn=7, live_throughout: 5, 22, 23, dead_or_set: 24 > insn=8, live_throughout: 5, 22, 23, 24, dead_or_set: 1 > insn=9, live_throughout: 1, 5, 23, 24, dead_or_set: 0, 22 > insn=10, live_throughout: 0, 1, 5, 23, dead_or_set: 3, 24 > insn=11, live_throughout: 5, 23, dead_or_set: 0, 1, 3 > insn=16, live_throughout: 5, dead_or_set: 1, 23 > insn=19, live_throughout: 1, 5, dead_or_set: > init_insns for 24: (insn_list:REG_DEP_TRUE 7 (nil)) > changing reg in insn 2 > changing reg in insn 9 > changing reg in insn 3 > changing reg in insn 16 > changing reg in insn 7 > changing reg in insn 7 > changing reg in insn 7 > changing reg in insn 10 > Spilling for insn 7. > Using reg 3 for reload 0 > Try Assign 24(a1), cost=0 > changing reg in insn 7 > changing reg in insn 10 > Register 24 now on stack. > > Spilling for insn 7. > Using reg 3 for reload 0 > Using reg 3 for reload 1 > Using reg 1 for reload 4 > Spilling for insn 10. > reload failure for reload 0 > > Reloads for insn # 10 > Reload 0: reload_in (QI) = (reg/v/f:QI 7 @H'fff8 [orig:23 s ] [23]) > ADDR_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 1) > reload_in_reg: (reg/v/f:QI 7 @H'fff8 [orig:23 s ] [23]) > Reload 1: CHIP_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 1), optional, can't > combine, secondary_reload_p > Reload 2: reload_in (QI) = (mem/f:QI (reg/v/f:QI 7 @H'fff8 [orig:23 s ] > [23]) [2 *s_1(D)+0 S1 A16]) > GENERAL_REGS, RELOAD_FOR_INPUT (opnum = 1), optional, can't combine > reload_in_reg: (reg/f:QI 24 [ *s_1(D) ]) > secondary_in_reload = 1 > > > > What's interesting is that GCC46 manages this fine since during IRA it > reports: > Pass 0 for finding pseudo/allocno costs > > a1 (r24,l0) best ADDR_REGS, cover GENERAL_REGS > a0 (r23,l0) best ADDR_REGS, cover GENERAL_REGS > a2 (r22,l0) best GENERAL_REGS, cover GENERAL_REGS > > This differs from what GCC47 does and seems to work better. > I would like help on how to best handle this situation under GCC47.
Not provide movmem which looks like open-coded and not in any way "optimized"? Richard. > Cheers, > > -- > PMatos >