On 07/25/2016 08:31 PM, Kito Cheng wrote:
Hi Jeff:
Oop, patch in attachment, and I hit this bug in gcc.dg/torture/vshuf-v2di.c
with our nds32 internal branch.
Hi Richard:
I think we really need reg dead note for some optimization, and btw,
here is our split pattern:
(define_split
[(set (match_operand:DI 0 "nds32_general_register_operand" "")
(match_operand:DI 1 "nds32_general_register_operand" ""))]
"find_regno_note (insn, REG_UNUSED, REGNO (operands[0])) != NULL
|| find_regno_note (insn, REG_UNUSED, REGNO (operands[0]) + 1) != NULL"
[(set (match_dup 0) (match_dup 1))]
{
rtx dead_note = find_regno_note (curr_insn, REG_UNUSED, REGNO (operands[0]));
HOST_WIDE_INT offset;
if (dead_note == NULL_RTX)
offset = 0;
else
offset = 4;
operands[0] = simplify_gen_subreg (
SImode, operands[0],
DImode, offset);
operands[1] = simplify_gen_subreg (
SImode, operands[1],
DImode, offset);
})
This seems better suited as a generic optimization than hidden away in a
backend.
AFAICT you're just noticing a word of the output operand is dead and
eliding the load/store for that word.
In fact, I'm a bit surprised nothing has optimized this away by the time
reload/LRA is done. You might spend some time walking through
lower-subreg to see if it can be easily extended to handle your case.
Jeff