Hi Vladimir, An issue has been identified with LRA when running CPU2006 h264ref benchmark.
I'll try to describe what the issue is and a fix applied as it is very difficult to reproduce it and it is next to impossible to create a narrowed testcase on top of the source code restrictions. The concerned LRA code in lra-constraints.c is the following: if (GET_CODE (*loc) == SUBREG) { reg = SUBREG_REG (*loc); byte = SUBREG_BYTE (*loc); if (REG_P (reg) /* Strict_low_part requires reload the register not the sub-register. */ && (curr_static_id->operand[i].strict_low || (GET_MODE_SIZE (mode) <= GET_MODE_SIZE (GET_MODE (reg)) && (hard_regno = get_try_hard_regno (REGNO (reg))) >= 0 && (simplify_subreg_regno (hard_regno, GET_MODE (reg), byte, mode) < 0) && (goal_alt[i] == NO_REGS || (simplify_subreg_regno (ira_class_hard_regs[goal_alt[i]][0], GET_MODE (reg), byte, mode) >= 0)))) { loc = &SUBREG_REG (*loc); mode = GET_MODE (*loc); } } The above works just fine when we deal with strict_low_part or a subreg smaller than a word. However, multi-word operations that were emitted as a sequence of operations on word sized parts of the DImode register appears to expose a problem with LRA e.g. '(set (subreg: SI (reg: DI)) ...)'. LRA does not realize that it actually uses the other halve of the DI-mode register leading to a situation where it modifies one halve of the result and spills the whole register with the other halve undefined. In the dump I can see the following: Creating newreg=1552 from oldreg=521, assigning class GR_REGS to r1552 1487: r1552:DI#4=r1404:SI+r1509:SI REG_DEAD r1509:SI REG_DEAD r1404:SI Inserting insn reload after: 1735: r521:DI=r1552:DI There is nothing in the dump that sets r1552:DI#0 nor a reload is inserted to load the value before modifying it but it is spilled. As it is a multi-word register, the split pass emits an additional instruction to load the whole 64-bit value but since one halve was modified, only register $20 appears in the live-in set. In contrast to $20, $21 is being used but not added to the live-in set. ... ;; live in 4 [$4] 6 [$6] 7 [$7] 10 [$10] 11 [$11] 12 [$12] 13 [$13] [$14] 15 [$15] 16 [$16] 17 [$17] 20 [$20] 22 [$22] 23 [$23] 24 [$24] 25 [$25] 29 [$sp] 30 [$fp] 31 [$31] 52 [$f20] 79 [$fakec] ... (insn 1788 1077 1789 80 (set (reg:SI 20 $20 [orig:521 distortion ] [521]) (mem/c:SI (plus:SI (reg/f:SI 29 $sp) (const_int 40 [0x28])) [16 %sfp+40 S4 A64])) rdopt.c:257 288 {*movsi_internal} (nil)) (insn 1789 1788 1743 80 (set (reg:SI 21 $21 [ distortion+4 ]) (mem/c:SI (plus:SI (reg/f:SI 29 $sp) (const_int 44 [0x2c])) [16 %sfp+44 S4 A32])) rdopt.c:257 288 {*movsi_internal} (nil)) ... The potential fix for this is to promote the type of a subreg OP_OUT to OP_INOUT to treat the pseudo register (r1552 in this case) as input and LRA will be forced to insert a reload before modifying its contents. Handling of strict_low_part case is fine as the operand is described in the MD pattern as IN_OUT through modifiers. With the above change in place, we get a reload before assignment: Creating newreg=1552 from oldreg=521, assigning class GR_REGS to r1552 1487: r1552:DI#4=r1404:SI+r1509:SI REG_DEAD r1509:SI REG_DEAD r1404:SI Inserting insn reload before: 1735: r1552:DI=r521:DI Inserting insn reload after: 1736: r521:DI=r1552:DI and the benchmark happily passes the runtime check. The question is whether changing the type to OP_INOUT is the correct and valid fix? Regards, Robert 2015-01-14 Robert Suchanek <robert.sucha...@imgtec.com> gcc/ * lra-constraints.c (curr_insn_transform): Change the type of a reload pseudo to OP_INOUT. --- gcc/lra-constraints.c | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c index ec28b7f..018968b 100644 --- a/gcc/lra-constraints.c +++ b/gcc/lra-constraints.c @@ -3798,6 +3798,7 @@ curr_insn_transform (void) (ira_class_hard_regs[goal_alt[i]][0], GET_MODE (reg), byte, mode) >= 0))))) { + type = OP_INOUT; loc = &SUBREG_REG (*loc); mode = GET_MODE (*loc); } -- 1.7.9.5