Yes, major code is borrowed from ARM port with the expected mode of loop pseudo changed to DImode. The function loop_canon_p called in sms_schedule can make sure that SMS is only performed for innermost loops. But I think it's a good idea to implement the TARGET_CAN_USE_DOLOOP_P hook here. See my updated patch below. How about this one?
Index: gcc/ChangeLog =================================================================== --- gcc/ChangeLog (revision 217394) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2014-11-13 Felix Yang <felix.y...@huawei.com> + + * config/aarch64/aarch64.c (doloop_end): New pattern. + * config/aarch64/aarch64.md (TARGET_CAN_USE_DOLOOP_P): Implement. + 2014-11-11 Andrew Pinski <apin...@cavium.com> Bug target/61997 Index: gcc/config/aarch64/aarch64.md =================================================================== --- gcc/config/aarch64/aarch64.md (revision 217394) +++ gcc/config/aarch64/aarch64.md (working copy) @@ -4087,6 +4087,47 @@ [(set_attr "type" "mrs")]) +;; Define the subtract-one-and-jump insns so loop.c +;; knows what to generate. +(define_expand "doloop_end" + [(use (match_operand 0 "" "")) ; loop pseudo + (use (match_operand 1 "" ""))] ; label + "" +{ + /* Currently SMS relies on the do-loop pattern to recognize loops + where (1) the control part consists of all insns defining and/or + using a certain 'count' register and (2) the loop count can be + adjusted by modifying this register prior to the loop. + ??? The possible introduction of a new block to initialize the + new IV can potentially affect branch optimizations. */ + if (optimize > 0 && flag_modulo_sched) + { + rtx s0; + rtx bcomp; + rtx loc_ref; + rtx cc_reg; + rtx insn; + rtx cmp; + + if (GET_MODE (operands[0]) != DImode) + FAIL; + + s0 = operands [0]; + insn = emit_insn (gen_adddi3_compare0 (s0, s0, GEN_INT (-1))); + + cmp = XVECEXP (PATTERN (insn), 0, 0); + cc_reg = SET_DEST (cmp); + bcomp = gen_rtx_NE (VOIDmode, cc_reg, const0_rtx); + loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands [1]); + emit_jump_insn (gen_rtx_SET (VOIDmode, pc_rtx, + gen_rtx_IF_THEN_ELSE (VOIDmode, bcomp, + loc_ref, pc_rtx))); + DONE; + } + else + FAIL; +}) + ;; AdvSIMD Stuff (include "aarch64-simd.md") Index: gcc/config/aarch64/aarch64.c =================================================================== --- gcc/config/aarch64/aarch64.c (revision 217394) +++ gcc/config/aarch64/aarch64.c (working copy) @@ -10224,6 +10224,9 @@ aarch64_use_by_pieces_infrastructure_p (unsigned i #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \ aarch64_use_by_pieces_infrastructure_p +#undef TARGET_CAN_USE_DOLOOP_P +#define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost + struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-aarch64.h" > > On 17 November 2014 07:59, Yangfei (Felix) <felix.y...@huawei.com> wrote: > > >> +2014-11-13 Felix Yang <felix.y...@huawei.com> > >> + > >> + * config/aarch64/aarch64.md (doloop_end): New pattern. > >> + > > This looks like a straight copy of the ARM code, but without the > TARGET_CAN_USE_DOLOOP_P definition. If the reason for including this code > is for the benefit of module-sched then should the hook be defined to limit > the > use of this pattern to inner most loops only? > > Cheers > /Marcus
aarch64-doloop-v4.diff
Description: aarch64-doloop-v4.diff