On 27/04/12 11:49, Richard Guenther wrote:
Yes, it inlines it. You may want to look at s390 which I believe has
a similar block-copy operation.
Richard.
I looked at s390 and even though the block copy instruction seems
similar ours is much more restrictive since it expects values in
specific registers, instead of allowing the register numbers to be
passed to the instruction (which is the case with s390 mvcle insn).
I decided to try and not hardcode the registers in the instruction but
since the instruction requires specific registers as operands I had to
create a class per register (with a single register in it) and then
register constraints for each of the classes. This turned out not to
work. RA breaks even earlier than before. Here's what I did:
(define_expand "movmemqi"
[(set (match_operand:BLK 0 "memory_operand") ; destination
(match_operand:BLK 1 "memory_operand")) ; source
(use (match_operand:QI 2 "general_operand"))] ; count
"!TARGET_NO_BLOCK_COPY && !reload_completed"
{
rtx dst_addr = XEXP(operands[0], 0);
rtx src_addr = XEXP(operands[1], 0);
rtx dst_reg = gen_reg_rtx(QImode); /* will be forced into AH */
rtx src_reg = gen_reg_rtx(QImode); /* will be forced into XL */
rtx cnt_reg = gen_reg_rtx(QImode); /* will be forced into AL */
emit_move_insn(cnt_reg, operands[2]);
if(GET_CODE(dst_addr) == PLUS)
{
emit_move_insn(dst_reg, XEXP(dst_addr, 0));
emit_insn(gen_addqi3(dst_reg, dst_reg, XEXP(dst_addr, 1)));
}
else
emit_move_insn(dst_reg, dst_addr);
if(GET_CODE(src_addr) == PLUS)
{
emit_move_insn(src_reg, XEXP(src_addr, 0));
emit_insn(gen_addqi3(src_reg, src_reg, XEXP(src_addr, 1)));
}
else
emit_move_insn(src_reg, src_addr);
emit_insn(gen_bc2(dst_reg, src_reg, cnt_reg));
DONE;
})
(define_insn "bc2"
[(set (match_operand:QI 0 "register_operand" "=l")
(const_int 0))
(set (mem:BLK (match_operand:QI 1 "register_operand" "=h"))
(mem:BLK (match_operand:QI 2 "register_operand" "=x")))
(set (match_dup 2)
(plus:QI (match_dup 2) (match_dup 0)))
(set (match_dup 1) (plus:QI (match_dup 1) (match_dup 0)))]
"!TARGET_NO_BLOCK_COPY"
"bc2")
constraints l, h and x correspond to singleton classes for registers AL,
AH and XL respectively. I think the problem here is the RA inability to
deal with such a constrained register set. Since I want to be able to
use our block copy instruction instead of disabling movmemqi, setmemqi
and therefore branch to memcpy, is there anything I can try to tune the RA?
--
PMatos