I suppose my main question about this is: did you consider extending the cfgloop machinery instead of writing new loop discovery code?
Also, loop discovery seems to be based on the liveness of the iterator, but what happens if the iterator is a general register that is spilled for part of the loop? Obviously it's best if that doesn't happen, but if it does, it looks like you might have a "loop" in which the recorded head isn't actually considered to be a member of the loop. I just wondered if it would be better to discover the loop "normally" and then analyse the use of the iterator. Bernd Schmidt <ber...@codesourcery.com> writes: > +typedef struct loop_info *loop_info; That'll fall foul of -Wc++-compat. I wonder whether we should use a more specialised name, such as "hw_doloop", and similarly in the function names. "loop_info" and "bb_in_loop" sound more like things that belong in cfgloop. > +struct hw_doloop_hooks > +{ > + /* Examine INSN. If it is a suitable doloop_end pattern, return the > + iteration register. Otherwise, return NULL_RTX. */ > + rtx (*end_pattern_reg) (rtx insn); > + /* Called after the hw-doloop pass has finished reordering the basic block > + structure. The target can defer some of its initializations to this > + hook if necessary. This function should return true if the CFG has been > + changed and loops must be rediscovered. */ > + bool (*after_reorder) (void); > + /* Optimize LOOP. Return true if successful, false if the loop should be > + marked bad. */ > + bool (*opt) (loop_info loop); > + /* Handle a loop that was marked bad for any reason. This could be used to > + split the doloop_end pattern. */ > + void (*fail) (loop_info loop); > +}; For the record, I wondered at first whether these should be target hooks, but I agree they shouldn't. md_reorg is special enough that this new infrastructure really does need to be a subroutine of md_reorg rather than a new pass. Having separate hooks is more flexible in that case, because it allows the hooks to rely on data that is set up by md_reorg beforehand. However, I think there needs to be a bit more overview commentary somewhere. The main comment is above reorg_loops(): > +/* Run from machine_dependent_reorg, this pass looks for doloop_end > + insns, analyzes the loop structure, and tries to rewrite the RTL of > + these loops so that the target can create whatever hardware looping > + instructions are available. > + > + DO_REORDER should be set to true if we should try to use the > + reorder_loops function to ensure the loop end occurs after the loop > + start. HOOKS is used to pass in target specific hooks; see > + hw-doloop.h. */ I think the combination of this comment and the comments in the hooks ought to be self-contained enough that a back-end writer could tell from them alone how to call the subroutine, and how the hooks need to be defined. Would you mind expanding it a bit? E.g. I found the after_reorder comment a bit hard to follow because there's no single comment that describes what reordering is done, and how the backend might want to react to it. I think it would also help to describe (broadly) what kind of hardware instruction you're trying to support. The current code seems to assume that the iterator is a single hard register. It might be worth stating (and asserting?) that somewhere. I also wondered whether the comment ought to be at the head of the file, but given that it's a subroutine rather than a stand-alone pass, there's not really a clear answer... > +/* Return true if BB is part of LOOP. */ > +bool > +bb_in_loop_p (loop_info loop, basic_block bb) Silly nitlet 1, but the file seems a bit inconsistent about the number of lines between the comment and the start of function. (0 here and a few other places, but mostly 1). > +/* Scan the blocks of LOOP (and its inferiors), and record things such as > + hard registers set, jumps out of and within the loop. */ > + > +static void > +scan_loop (loop_info loop) > +{ > + unsigned ix; > + basic_block bb; > + > + if (loop->head == NULL) > + { > + gcc_assert (loop->bad); > + return; > + } > + CLEAR_HARD_REG_SET (loop->regs_set_in_loop); > + loop->iter_reg_used = false; > + loop->jumps_within = false; > + loop->jumps_outof = false; > + loop->has_call = false; > + loop->has_asm = false; > + loop->iter_reg_used_outside = false; > + loop->timode_insns = 0; Since the initialisation is split over two places, the zero initialisations above and the mixed initialisations here: > + loop->tail = tail_bb; > + loop->loop_end = tail_insn; > + loop->last_insn = NULL_RTX; > + loop->iter_reg = reg; > + loop->depth = loop->length = 0; > + loop->visited = false; > + loop->bad = false; > + loop->outer = NULL; > + loop->loops = NULL; > + loop->incoming = VEC_alloc (edge, gc, 2); > + loop->end_label = NULL_RTX; > + loop->start_label = JUMP_LABEL (tail_insn); it might be better to allocate using XCNEW instead. > + if (GET_MODE (insn) == TImode) > + loop->timode_insns++; I assume this is used by the C6X port? The information is only going to be meaningful to ports like C6X that treat scheduling as a subroutine of md_reorg, so I wonder whether it belongs here. If it stays, I think there should be a comment above the field definition saying that it is only meaningful in that case. > + CLEAR_HARD_REG_SET (set_this_insn); > + note_stores (PATTERN (insn), record_hard_reg_sets, > + &set_this_insn); > + for (x = REG_NOTES (insn); x; x = XEXP (x, 1)) > + if (REG_NOTE_KIND (x) == REG_INC) > + record_hard_reg_sets (XEXP (x, 0), NULL, &set_this_insn); > + > + if (CALL_P (insn)) > + { > + loop->has_call = true; > + for (x = CALL_INSN_FUNCTION_USAGE (insn); x; x = XEXP (x, 1)) > + if (GET_CODE (XEXP (x, 0)) == CLOBBER) > + record_hard_reg_sets (XEXP (XEXP (x, 0), 0), NULL, > + &set_this_insn); > + > + } I think this should be iterating over DF_INSN_DEFS instead. If that isn't accurate for some reason, it needs to be fixed... > + if (EDGE_COUNT (tail_bb->succs) != 2) > + { > + loop->bad = true; > + loop->head = loop->successor = NULL; What's the significance of clearing "head" and "successor" as well as setting "bad"? > + works = VEC_alloc (basic_block,heap,20); Missing spaces. > + if (!VEC_space (basic_block, works, 1)) > + { > + if (dwork) > + { > + VEC_block_remove (basic_block, works, 0, dwork); > + dwork = 0; > + } > + else > + VEC_reserve (basic_block, heap, works, 1); > + } > + VEC_quick_push (basic_block, works, succ); I'm being dense, sorry, but could you walk me through what this is doing (i.e. why it isn't a simple VEC_safe_push)? > + for (pass = 0, retry = 1; retry && pass < 2; pass++) > + { > + edge e; > + edge_iterator ei; > + bool first = true; > + retry = 0; > + > + FOR_EACH_EDGE (e, ei, loop->incoming) > + { > + if (first) > + { > + loop->incoming_src = e->src; > + loop->incoming_dest = e->dest; > + first = false; > + } > + else > + { > + if (e->dest != loop->incoming_dest) > + loop->incoming_dest = NULL; > + if (e->src != loop->incoming_src) > + loop->incoming_src = NULL; > + } > + if (loop->incoming_src == NULL && loop->incoming_dest == NULL) > + { > + if (pass == 0) > + { > + if (dump_file) > + fprintf (dump_file, > + ";; retrying loop %d with forwarder blocks\n", > + loop->loop_no); > + retry = 1; > + break; > + } > + loop->bad = 1; > + if (dump_file) > + fprintf (dump_file, > + ";; can't find suitable entry for loop %d\n", > + loop->loop_no); > + goto out; > + } > + } > + if (retry) > + { > + retry = 0; > + FOR_EACH_EDGE (e, ei, loop->incoming) > + { > + if (forwarder_block_p (e->src)) > + { > + edge e2; > + edge_iterator ei2; > + > + if (dump_file) > + fprintf (dump_file, > + ";; Adding forwarder block %d to loop %d and > retrying\n", > + e->src->index, loop->loop_no); > + VEC_safe_push (basic_block, heap, loop->blocks, e->src); > + bitmap_set_bit (loop->block_bitmap, e->src->index); > + FOR_EACH_EDGE (e2, ei2, e->src->preds) > + VEC_safe_push (edge, gc, loop->incoming, e2); > + VEC_unordered_remove (edge, loop->incoming, ei.index); > + retry = 1; > + break; > + } > + } > + if (!retry) > + { > + if (dump_file) > + fprintf (dump_file, ";; No forwarder blocks found\n"); > + loop->bad = 1; > + } > + } > + } > + } I think this would be easier to follow if the discovery part of the loop_incoming loop was split out: static bool process_incoming_edges_p (loop_info loop) { edge e; edge_iterator ei; bool first = true; FOR_EACH_EDGE (e, ei, loop->incoming) { if (first) { loop->incoming_src = e->src; loop->incoming_dest = e->dest; first = false; } else { if (e->dest != loop->incoming_dest) loop->incoming_dest = NULL; if (e->src != loop->incoming_src) loop->incoming_src = NULL; if (loop->incoming_src == NULL && loop->incoming_dest == NULL) return false; } } return true; } static bool add_forwarder_block (loop_info loop) { edge e; edge_iterator ei; FOR_EACH_EDGE (e, ei, loop->incoming) if (forwarder_block_p (e->src)) { edge e2; edge_iterator ei2; if (dump_file) fprintf (dump_file, ";; Adding forwarder block %d to loop %d and retrying\n", e->src->index, loop->loop_no); VEC_safe_push (basic_block, heap, loop->blocks, e->src); bitmap_set_bit (loop->block_bitmap, e->src->index); FOR_EACH_EDGE (e2, ei2, e->src->preds) VEC_safe_push (edge, gc, loop->incoming, e2); VEC_unordered_remove (edge, loop->incoming, ei.index); } return false; } and then: if (!process_incoming_edges_p (loop)) { /* Try again with forwarder blocks. */ if (dump_file) fprintf (dump_file, ";; retrying loop %d with forwarder blocks\n", loop->loop_no); if (!add_forwarder_block (loop)) { if (dump_file) fprintf (dump_file, ";; No forwarder blocks found\n"); loop->bad = 1; } else if (!process_incoming_edges_p (loop)) { if (dump_file) fprintf (dump_file, ";; can't find suitable entry for loop %d\n", loop->loop_no); loop->bad = 1; } } Could you add commentary explaining what this case is trying to handle? It seems like adding the forwarder adds more incoming_dests, so presumably the idea is to unify the incoming_srcs. Something like (in insn order): +---- B1 | | | V | +-- B2 | | | | L1 <-+ | | | | | | V | | +-> L2 | | | | | V | +---> L3 | | | V | L4 --+ | V ... So we'd then consider B2 to be part of the loop. Is that right? If so, how does it help? B2 still seems a bit of an odd-man-out, even if we pretend it's part of the loop. > + /* There's a degenerate case we can handle - an empty loop consisting > + of only a back branch. Handle that by deleting the branch. */ > + insn = JUMP_LABEL (tail); > + while (insn && !NONDEBUG_INSN_P (insn)) > + insn = NEXT_INSN (insn); > + if (insn == tail) > + { > + if (dump_file) > + { > + fprintf (dump_file, ";; degenerate loop ending at\n"); > + print_rtl_single (dump_file, tail); > + } > + delete_insn_and_edges (tail); > + continue; > + } Do we really need to handle that here? It sounds like something that should be optimised by an earlier pass. Don't you also need to check that the iterator isn't live on fallthrough? > + bb->aux = loop; Where is this used? Is it provided for the backend hooks? If so, that doesn't seem to be documented. > + bitmap_and (tmp_bitmap, other->block_bitmap, loop->block_bitmap); > + if (bitmap_empty_p (tmp_bitmap)) > + continue; > + if (bitmap_equal_p (tmp_bitmap, other->block_bitmap)) > + { > + other->outer = loop; > + VEC_safe_push (loop_info, heap, loop->loops, other); > + } > + else if (bitmap_equal_p (tmp_bitmap, loop->block_bitmap)) > + { > + loop->outer = other; > + VEC_safe_push (loop_info, heap, other->loops, loop); > + } ->outer doesn't seem to be used anywhere, and the choice of loop looks somewhat arbitrary. We can avoid the temporary here with: if (!bitmap_intersect_p (other->block_bitmap, loop->block_bitmap)) continue; if (!bitmap_intersect_compl_p (other->block_bitmap, loop->block_bitmap)) { other->outer = loop; VEC_safe_push (loop_info, heap, loop->loops, other); } else if (!bitmap_intersect_compl_p (loop->block_bitmap, other->block_bitmap)) { loop->outer = other; VEC_safe_push (loop_info, heap, other->loops, loop); } > +#define BB_AUX_INDEX(BB) ((unsigned)(BB)->aux) I think this'll produce warnings on targets with sizeof(int) != sizeof (void *). > + /* Recreate an index for basic blocks that represents their order. */ > + for (bb = ENTRY_BLOCK_PTR->next_bb, index = 0; > + bb != EXIT_BLOCK_PTR; > + bb = bb->next_bb, index++) > + bb->aux = (PTR) index; I think this would be clearer with FOR_EACH_BB. It seems a shame to recompute this for every loop though, especially if no changes are needed. > + if (BB_AUX_INDEX (loop->head) < BB_AUX_INDEX (loop->tail)) > + continue; <=? A self loop isn't a problem. > + FOR_EACH_EDGE (e, ei, loop->head->succs) > + { > + if (bitmap_bit_p (loop->block_bitmap, e->dest->index) > + && BB_AUX_INDEX (e->dest) < BB_AUX_INDEX (loop->tail)) > + { > + basic_block start_bb = e->dest; > + basic_block start_prev_bb = start_bb->prev_bb; > + > + if (dump_file) > + fprintf (dump_file, ";; Moving block %d before block %d\n", > + loop->head->index, start_bb->index); > + loop->head->prev_bb->next_bb = loop->head->next_bb; > + loop->head->next_bb->prev_bb = loop->head->prev_bb; > + > + loop->head->prev_bb = start_prev_bb; > + loop->head->next_bb = start_bb; > + start_prev_bb->next_bb = start_bb->prev_bb = loop->head; > + break; > + } > + } How confident are we at this stage that the loop can use the hardware instruction (i.e. won't be marked bad)? It might be nice to have a comment about the impact of this change in terms of branching, both when we can and can't do the actual loop optimisation. Is there no possibility of running the optimisation in cfglayout mode instead? It seems from this and the forwarder block stuff above as though it might make things easier, but maybe not. > + /* Every loop contains in its list of inner loops every loop nested inside > + it, even if there are intermediate loops. This works because we're > doing > + a depth-first search here and never visit a loop more than once. */ > + for (ix = 0; VEC_iterate (loop_info, loop->loops, ix, inner); ix++) > + { > + optimize_loop (inner, hooks); Perhaps use a worklist here, to avoid O(loops)-deep recursion? > + df_live_add_problem (); > + df_live_set_all_dirty (); > + df_analyze (); > + > + bitmap_obstack_initialize (&stack); > + > + if (dump_file) > + fprintf (dump_file, ";; Find loops, first pass\n\n"); > + > + loops = discover_loops (&stack, hooks); Is it a required part of the interface that we call things like after_reorder even if there are no loops? If so, I think that needs to be stated, otherwise we should early out if loops is null. > + if (do_reorder) > + { > + reorder_loops (loops); > + free_loops (loops); > + > + if (dump_file) > + fprintf (dump_file, ";; Find loops, second pass\n\n"); > + > + loops = discover_loops (&stack, hooks); > + } > + > + if (hooks->after_reorder) > + { > + if (hooks->after_reorder ()) > + { > + free_loops (loops); > + if (dump_file) > + fprintf (dump_file, ";; Find loops after target after_reorder\n\n"); > + > + loops = discover_loops (&stack, hooks); > + } > + } >From a naive reading, this looks a bit inefficient: the loops created after the reorder aren't passed to the after_reorder hook, and could be immediately discarded if after_reorder returns true. Maybe just set loops to null if they become invalid, and recreate them with: if (!loops) loops = discover_loops (); Depends on the answer to the early-out question above though. Richard