On Thu, Apr 26, 2012 at 6:16 PM, Paulo J. Matos <pa...@matos-sorge.com> wrote:
> Hi,
>
> I am facing a problem with the GCC47 register allocation and my movmemqi.
> GCC46 dealt very well with the problem but GCC47 keeps throwing at me
> register spill failures.
>
> My backend has very few registers. 3 chip registers in total (class
> CHIP_REGS), one of them (XL) is used for memory references (class ADDR_REGS)
> and the other two (AL, AH) are for normal use (DATA_REGS), so CHIP_REGS =
> ADDR_REGS U DATA_REGS.
>
> There are a couple of other memory mapped registers, but all loads and
> stores go through CHIP_REGS.
>
> My chip has a block copy instruction which needs source address in XL,
> destination address in AH and count in AL. My movmemqi is similar to
> movmemsi in rx.
>
> (define_expand "movmemqi"
>  [(use (match_operand:BLK 0 "memory_operand"))
>   (use (match_operand:BLK 1 "memory_operand"))
>   (use (match_operand:QI 2 "general_operand"))
>   (use (match_operand:QI 3 "general_operand"))]
>  ""
> {
>    rtx dst_addr = XEXP(operands[0], 0);
>    rtx src_addr = XEXP(operands[1], 0);
>    rtx dst_reg = gen_rtx_REG(QImode, RAH);
>    rtx src_reg = gen_rtx_REG(QImode, RXL);
>    rtx cnt_reg = gen_rtx_REG(QImode, RAL);
>
>    emit_move_insn(cnt_reg, operands[2]);
>
>    if(GET_CODE(dst_addr) == PLUS)
>    {
>        emit_move_insn(dst_reg, XEXP(dst_addr, 0));
>        emit_insn(gen_addqi3(dst_reg, dst_reg, XEXP(dst_addr, 1)));
>    }
>    else
>        emit_move_insn(dst_reg, dst_addr);
>
>    if(GET_CODE(src_addr) == PLUS)
>    {
>        emit_move_insn(src_reg, XEXP(src_addr, 0));
>        emit_insn(gen_addqi3(src_reg, src_reg, XEXP(src_addr, 1)));
>    }
>    else
>        emit_move_insn(src_reg, src_addr);
>
>    emit_insn(gen_bc2());
>
>    DONE;
> })
>
> (define_insn "bc2"
>  [(set (reg:QI RAL) (const_int 0))
>   (set (mem:BLK (reg:QI RAH)) (mem:BLK (reg:QI RXL)))
>   (set (reg:QI RXL) (plus:QI (reg:QI RXL) (reg:QI RAL)))
>   (set (reg:QI RAH) (plus:QI (reg:QI RAH) (reg:QI RAL)))]
>  ""
>  "bc2")
>
> The parallel in bc2 setups what the bc2 chip instruction modifies. Copies
> block in XL to AH, Moves XL to point to the end of the source block, AH to
> point to the end of the destination block and sets AL to 0.
>
> The C code
> int **
> t25 (int *d, int **s)
> {
>  memcpy (d, *s, 16);
>  return s;
> }
>
> turns into the following after asmcons (-Os passed in):
> (note 5 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>
> (insn 2 5 3 2 (parallel [
>            (set (reg/v/f:QI 22 [ d ])
>                (reg:QI 1 AL [ d ]))
>            (clobber (reg:CC 13 CC))
>        ]) memcpy.i:3 6 {*movqi}
>     (expr_list:REG_DEAD (reg:QI 1 AL [ d ])
>        (expr_list:REG_UNUSED (reg:CC 13 CC)
>            (nil))))
>
> (insn 3 2 4 2 (parallel [
>            (set (reg/v/f:QI 23 [ s ])
>                (reg:QI 0 AH [ s ]))
>            (clobber (reg:CC 13 CC))
>        ]) memcpy.i:3 6 {*movqi}
>     (expr_list:REG_DEAD (reg:QI 0 AH [ s ])
>        (expr_list:REG_UNUSED (reg:CC 13 CC)
>            (nil))))
>
> (note 4 3 7 2 NOTE_INSN_FUNCTION_BEG)
>
> (insn 7 4 8 2 (parallel [
>            (set (reg/f:QI 24 [ *s_1(D) ])
>                (mem/f:QI (reg/v/f:QI 23 [ s ]) [2 *s_1(D)+0 S1 A16]))
>            (clobber (reg:CC 13 CC))
>        ]) memcpy.i:4 6 {*movqi}
>     (expr_list:REG_UNUSED (reg:CC 13 CC)
>        (nil)))
>
> (insn 8 7 9 2 (parallel [
>            (set (reg:QI 1 AL)
>                (const_int 16 [0x10]))
>            (clobber (reg:CC 13 CC))
>        ]) memcpy.i:4 6 {*movqi}
>     (expr_list:REG_UNUSED (reg:CC 13 CC)
>        (nil)))
>
> (insn 9 8 10 2 (parallel [
>            (set (reg:QI 0 AH)
>                (reg/v/f:QI 22 [ d ]))
>            (clobber (reg:CC 13 CC))
>        ]) memcpy.i:4 6 {*movqi}
>     (expr_list:REG_DEAD (reg/v/f:QI 22 [ d ])
>        (expr_list:REG_UNUSED (reg:CC 13 CC)
>            (nil))))
>
> (insn 10 9 11 2 (parallel [
>            (set (reg:QI 3 X)
>                (reg/f:QI 24 [ *s_1(D) ]))
>            (clobber (reg:CC 13 CC))
>        ]) memcpy.i:4 6 {*movqi}
>     (expr_list:REG_DEAD (reg/f:QI 24 [ *s_1(D) ])
>        (expr_list:REG_UNUSED (reg:CC 13 CC)
>            (nil))))
>
> (insn 11 10 16 2 (parallel [
>            (set (reg:QI 1 AL)
>                (const_int 0 [0]))
>            (set (mem:BLK (reg:QI 0 AH) [0 A16])
>                (mem:BLK (reg:QI 3 X) [0 A16]))
>            (set (reg:QI 3 X)
>                (plus:QI (reg:QI 3 X)
>                    (reg:QI 1 AL)))
>            (set (reg:QI 0 AH)
>                (plus:QI (reg:QI 0 AH)
>                    (reg:QI 1 AL)))
>        ]) memcpy.i:4 21 {bc2}
>     (expr_list:REG_UNUSED (reg:QI 3 X)
>        (expr_list:REG_UNUSED (reg:QI 1 AL)
>            (expr_list:REG_UNUSED (reg:QI 0 AH)
>                (nil)))))
>
> (insn 16 11 19 2 (parallel [
>            (set (reg/i:QI 1 AL)
>                (reg/v/f:QI 23 [ s ]))
>            (clobber (reg:CC 13 CC))
>        ]) memcpy.i:6 6 {*movqi}
>     (expr_list:REG_DEAD (reg/v/f:QI 23 [ s ])
>        (expr_list:REG_UNUSED (reg:CC 13 CC)
>            (nil))))
>
> (insn 19 16 0 2 (use (reg/i:QI 1 AL)) memcpy.i:6 -1
>     (nil))
>
> Pass ira starts by reporting:
> ;; Function t25 (t25, funcdef_no=0, decl_uid=1309, cgraph_uid=0)
>
> starting the processing of deferred insns
> ending the processing of deferred insns
> df_analyze called
> Building IRA IR
> starting the processing of deferred insns
> ending the processing of deferred insns
> df_analyze called
> init_insns for 24: (insn_list:REG_DEP_TRUE 7 (nil))
>
> Pass 0 for finding pseudo/allocno costs
>
>    a1 (r24,l0) best ADDR_REGS, allocno ADDR_REGS
>    a0 (r23,l0) best ADDR_REGS, allocno ADDR_REGS
>    a2 (r22,l0) best GENERAL_REGS, allocno GENERAL_REGS
>
>  a0(r23,l0) costs: ADDR_REGS:0 DATA_REGS:4000 STACK_REGS:4000 CHIP_REGS:4000
> FAKE_REGS:4000 MEM_REGS:4000 GENERAL_REGS:6000 ALL_REGS:6000 MEM:13000
>  a1(r24,l0) costs: ADDR_REGS:-1000 DATA_REGS:0 STACK_REGS:2000 CHIP_REGS:0
> FAKE_REGS:2000 MEM_REGS:2000 GENERAL_REGS:2000 ALL_REGS:2000 MEM:-1000
>  a2(r22,l0) costs: ADDR_REGS:0 DATA_REGS:0 STACK_REGS:0 CHIP_REGS:0
> FAKE_REGS:0 MEM_REGS:0 GENERAL_REGS:2000 ALL_REGS:2000 MEM:7000
>
>
> Pass 1 for finding pseudo/allocno costs
>
>    r24: preferred ADDR_REGS, alternative NO_REGS, allocno ADDR_REGS
>    r23: preferred ADDR_REGS, alternative GENERAL_REGS, allocno GENERAL_REGS
>    r22: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
>
>  a0(r23,l0) costs: ADDR_REGS:0 DATA_REGS:4000 STACK_REGS:6000 CHIP_REGS:4000
> FAKE_REGS:6000 MEM_REGS:6000 GENERAL_REGS:6000 ALL_REGS:6000 MEM:13000
>  a1(r24,l0) costs: ADDR_REGS:-1000 DATA_REGS:0 STACK_REGS:2000 CHIP_REGS:0
> FAKE_REGS:2000 MEM_REGS:2000 GENERAL_REGS:2000 ALL_REGS:2000 MEM:-1000
>  a2(r22,l0) costs: GENERAL_REGS:2000 MEM:7000
>
>   Insn 19(l0): point = 0
>   Insn 16(l0): point = 2
>   Insn 11(l0): point = 4
>   Insn 10(l0): point = 6
>   Insn 9(l0): point = 8
>   Insn 8(l0): point = 10
>   Insn 7(l0): point = 12
>   Insn 3(l0): point = 14
>   Insn 2(l0): point = 16
>  a0(r23): [3..14]
>  a1(r24): [7..12]
>  a2(r22): [9..16]
> Compressing live ranges: from 19 to 2 - 10%
> Ranges after the compression:
>  a0(r23): [0..1]
>  a1(r24): [0..1]
>  a2(r22): [0..1]
> +++Allocating 12 bytes for conflict table (uncompressed size 12)
> ;; a0(r23,l0) conflicts: a1(r24,l0) a2(r22,l0)
> ;;     total conflict hard regs: 0 1 3
> ;;     conflict hard regs: 0 1 3
>
> ;; a1(r24,l0) conflicts: a0(r23,l0) a2(r22,l0)
> ;;     total conflict hard regs:
> ;;     conflict hard regs:
>
> ;; a2(r22,l0) conflicts: a0(r23,l0) a1(r24,l0)
> ;;     total conflict hard regs: 0 1
> ;;     conflict hard regs: 0 1
>
>  regions=1, blocks=3, points=2
>    allocnos=3 (big 0), copies=0, conflicts=0, ranges=3
>
> **** Allocnos coloring:
>
>
>  Loop 0 (parent -1, header bb2, depth 0)
>    bbs: 2
>    all: 0r23 1r24 2r22
>    modified regnos: 22 23 24
>    border:
>    Pressure: GENERAL_REGS=4
>    Hard reg set forest:
>      0:( 0 1 3 7-12)@0
>        1:( 3 7-12)@18000
>          2:( 7-12)@26000
>          3:( 3)@0
>      Allocno a0r23 of GENERAL_REGS(9) has 6 avail. regs  7-12, node: 7-12
> (confl regs =  0-6 13-12)
>      Allocno a1r24 of ADDR_REGS(1) has 1 avail. regs  3, node:  3 (confl
> regs =  0-2 4-12)
>      Allocno a2r22 of GENERAL_REGS(9) has 7 avail. regs  3 7-12, node:  3
> 7-12 (confl regs =  0-2 4-6 13-12)
>      Pushing a2(r22,l0)(cost 0)
>        Making a1(r24,l0) colorable
>      Pushing a1(r24,l0)(cost 0)
>      Pushing a0(r23,l0)(cost 0)
>      Popping a0(r23,l0)  -- assign reg 7
>      Popping a1(r24,l0)  -- assign reg 3
>      Popping a2(r22,l0)  -- assign reg 8
> Disposition:
>    2:r22  l0     8    0:r23  l0     7    1:r24  l0     3
> New iteration of spill/restore move
> +++Costs: overall 7000, reg 7000, mem 0, ld 0, st 0, move 0
> +++       move loops 0, new jumps 0
>
>
> Followed by reload breaking after a spill it can't handle:
> ;; Function t25 (t25, funcdef_no=0, decl_uid=1309, cgraph_uid=0)
>
> insn=2, live_throughout: 0, 5, dead_or_set: 1, 22
> insn=3, live_throughout: 5, 22, dead_or_set: 0, 23
> insn=7, live_throughout: 5, 22, 23, dead_or_set: 24
> insn=8, live_throughout: 5, 22, 23, 24, dead_or_set: 1
> insn=9, live_throughout: 1, 5, 23, 24, dead_or_set: 0, 22
> insn=10, live_throughout: 0, 1, 5, 23, dead_or_set: 3, 24
> insn=11, live_throughout: 5, 23, dead_or_set: 0, 1, 3
> insn=16, live_throughout: 5, dead_or_set: 1, 23
> insn=19, live_throughout: 1, 5, dead_or_set:
> init_insns for 24: (insn_list:REG_DEP_TRUE 7 (nil))
> changing reg in insn 2
> changing reg in insn 9
> changing reg in insn 3
> changing reg in insn 16
> changing reg in insn 7
> changing reg in insn 7
> changing reg in insn 7
> changing reg in insn 10
> Spilling for insn 7.
> Using reg 3 for reload 0
>      Try Assign 24(a1), cost=0
> changing reg in insn 7
> changing reg in insn 10
>  Register 24 now on stack.
>
> Spilling for insn 7.
> Using reg 3 for reload 0
> Using reg 3 for reload 1
> Using reg 1 for reload 4
> Spilling for insn 10.
> reload failure for reload 0
>
> Reloads for insn # 10
> Reload 0: reload_in (QI) = (reg/v/f:QI 7 @H'fff8 [orig:23 s ] [23])
>        ADDR_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 1)
>        reload_in_reg: (reg/v/f:QI 7 @H'fff8 [orig:23 s ] [23])
> Reload 1: CHIP_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 1), optional, can't
> combine, secondary_reload_p
> Reload 2: reload_in (QI) = (mem/f:QI (reg/v/f:QI 7 @H'fff8 [orig:23 s ]
> [23]) [2 *s_1(D)+0 S1 A16])
>        GENERAL_REGS, RELOAD_FOR_INPUT (opnum = 1), optional, can't combine
>        reload_in_reg: (reg/f:QI 24 [ *s_1(D) ])
>        secondary_in_reload = 1
>
>
>
> What's interesting is that GCC46 manages this fine since during IRA it
> reports:
> Pass 0 for finding pseudo/allocno costs
>
>    a1 (r24,l0) best ADDR_REGS, cover GENERAL_REGS
>    a0 (r23,l0) best ADDR_REGS, cover GENERAL_REGS
>    a2 (r22,l0) best GENERAL_REGS, cover GENERAL_REGS
>
> This differs from what GCC47 does and seems to work better.
> I would like help on how to best handle this situation under GCC47.

Not provide movmem which looks like open-coded and not in any way
"optimized"?

Richard.

> Cheers,
>
> --
> PMatos
>

Reply via email to