On Mon, Nov 9, 2015 at 6:35 PM, Bernd Schmidt <bschm...@redhat.com> wrote: > On 11/07/2015 03:44 PM, Kelvin Nilsen wrote: >> >> This is a draft patch to partially address the concerns described in >> bugzilla problem report >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68212). The patch is >> incomplete in the sense that there are some known shortcomings with >> nested loops which I am still working on. I am sending this out for >> comments at this time because we would like these patches to be >> integrated into the GCC 6 release and want to begin responding to >> community feedback as soon as possible in order to make the integration >> possible. > > > I'll mainly comment on style points right now. Your code generally looks > mostly good but doesn't quite follow our guidelines. In terms of logic I'm > sure there will be followup questions after the first round of points is > addressed. Others might be better qualified to review the frequencies > manipulation; for this first round I'm just assuming that the general idea > is sound (but would appreciate input). > >> 1. Before a loop body is unpeeled into a pre-header location, we >> temporarily adjust the loop body frequencies to represent the values >> appropriate for the context into which the loop body is to be >> copied. >> >> 2. After unrolling the loop body (by replicating the loop body (N-1) >> times within the loop), we recompute all frequencies associated with >> blocks contained within the loop. > > > If these are independent from each other it might be better to split up the > patch. > >> (check_loop_frequency_integrity): new helper routine >> (set_zero_probability): added another parameter >> (duplicate_loop_to_header_edge): Add code to recompute loop >> body frequencies after blocks are replicated (unrolled) into the loop >> body. Introduce certain help routines because existing infrastructure >> routines are not reliable during typical executions of >> duplicate_loop_to_header_edge(). > > > Please review our guidelines how to write ChangeLogs - capitalize, > punctuate, and document only the what, not the why. Also, linewrap manually. > >> opt_info_start_duplication (opt_info); >> + >> ok = duplicate_loop_to_header_edge (loop, loop_latch_edge (loop), > > > Please make sure you don't change whitespace unnecessarily. There are a few > other occurrences in the patch, and also cases where you seem to be adding > spaces to otherwise blank lines. > >> @@ -1015,14 +1041,44 @@ unroll_loop_runtime_iterations (struct loop *loop) >> bitmap_clear_bit (wont_exit, may_exit_copy); >> opt_info_start_duplication (opt_info); >> >> + { >> + /* Recompute the loop body frequencies. */ >> + zero_loop_frequencies (loop); > > > No reason to start a braced block here. > >> + /* Scale the incoming frequencies according to the heuristic that >> + * the loop frequency is the incoming edge frequency divided by >> + * 0.09. This heuristic applies only to loops that iterate over a >> + * run-time value that is not known at compile time. Note that >> + * 1/.09 equals 11.1111. We'll use integer arithmetic on ten >> + * thousandths, and then divide by 10,000 after we've "rounded". >> + */ > > > Please examine the comment style in gcc - no asterisks to start new lines, > and comment terminators don't go on their own line. > >> + sum_incoming_frequencies *= 111111; /* multiply by 11.1111 */ >> + sum_incoming_frequencies += 5000; /* round by adding 0.5 */ >> + sum_incoming_frequencies /= 10000; /* convert ten thousandths >> + to ones >> + */ > > > These comments could also be improved, but really they should just be > removed since they're pretty obvious and redundant with the one before. > >> +/* Define ENABLE_CHECKING to enforce the following run-time checks. > > > "With checking enabled, the following run-time checks are performed:" > >> + * This may report false-positive errors due to round-off errors. > > > That doesn't sound good as it could lead to bootstrap failures when checking > is enabled. > >> @@ -44,6 +55,543 @@ static void fix_loop_placements (struct loop *, bo >> static bool fix_bb_placement (basic_block); >> static void fix_bb_placements (basic_block, bool *, bitmap); >> >> +/* >> + * Return true iff block is considered to reside within the loop >> + * represented by loop_ptr. >> + */ > > > Arguments are capitalized in function comments. > >> +bool >> +in_loop_p (basic_block block, struct loop *loop_ptr) >> +{ >> + basic_block *bbs = get_loop_body (loop_ptr); >> + bool result = false; >> + >> + for (unsigned int i = 0; i < loop_ptr->num_nodes; i++) >> + { >> + if (bbs[i] == block) >> + result = true; >> + } > > > I think something that starts with bb->loop_father and iterates outwards > would be more efficient.
Well, either bb->loop_father == loop_ptr or that with || flow_loop_nested_p (loop_ptr, bb->loop_father). >> +/* A list of block_ladder_rung structs is used to keep track of all the >> + * blocks visited in a depth-first recursive traversal of a control-flow >> + * graph. This list is used to detect and prevent attempts to revisit >> + * a block that is already being visited in the recursive traversal. >> + */ >> +typedef struct block_ladder_rung { >> + basic_block block; >> + struct block_ladder_rung *lower_rung; >> +} *ladder_rung_p; > > > I was going to suggest a vec rather than manually dealing with lists, but it > looks like you're using these in a rather unusual way in a recursive walk. > I'm not actually sure that what you're doing entirely prevents revisiting a > block, it only does so for a single recursive call-chain, but two separate > recursive calls from the same level could still visit the same blocks. I'm > not sure this is intended (the usual approach would be to use a (s)bitmap of > visited blocks). More comments on the whole recursion scheme below. You can also use the BB_VISITED flag. Or dfs_enumerate_from () >> +/* Return true iff an_edge represents the same source and destination >> + * blocks as another_edge. >> + */ >> +static bool >> +same_edge_p (edge an_edge, edge another_edge) >> +{ >> + return ((an_edge->src == another_edge->src) >> + && (an_edge->dest == another_edge->dest)); >> +} > > > Formatting aside (extra parentheses), I wonder why you can't just test edges > for pointer equality? Does anything make duplicate edges? We don't allow duplicate edges apart from edges with different flags (an EH edge may appear alongside a FALLTRHU one for example). Some more general comments. The patch lacks a testcase (or a few). We do have general checking of frequencies in cfg.c:check_bb_profile which will cause dump files (of the unrolling dump for example) to contain stuff like "Invalid sum of incoming frequencies..." A testcase would be sth that has them before but not after the patches. >> +static void >> +recursively_zero_frequency (struct loop *loop_ptr, vec<edge> exit_edges, >> + ladder_rung_p ladder_rung, >> + edge incoming_edge) >> +{ >> + if (incoming_edge->dest == loop_ptr->header) >> + return; >> + else if (in_edge_set_p (incoming_edge, exit_edges)) >> + return; >> + else if (in_call_chain_p (incoming_edge, ladder_rung)) >> + return; >> + else >> + { >> + struct block_ladder_rung a_rung; >> + basic_block block = incoming_edge->dest; >> + >> + a_rung.block = block; >> + a_rung.lower_rung = ladder_rung; >> + block->frequency = 0; >> + >> + edge_iterator ei; >> + edge successor; >> + FOR_EACH_EDGE (successor, ei, block->succs) >> + { >> + recursively_zero_frequency (loop_ptr, exit_edges, >> + &a_rung, successor); >> + } >> + } >> +} > > > Does this really have to be recursive, and this complicated? I think > something with a worklist and maybe a bitmap to indicate visited blocks > would be considerably easier to understand, and the recursion could be a > problem with very large testcases. > > You might want to check whether something like dfs_enumerate_from or other > functions in cfganal would be helpful. Simply iterating over all BBs returned from get_loop_body would be sufficient. >> +static bool >> +recursion_detected_p (basic_block candidate, ladder_rung_p lower_steps) { > > > Don't place the open brace on the same line as the declaration. > >> +/* Return true iff candidate is contained within the loop represented >> + * by loop_header and loop_latch. >> + * >> + * We consider the block to be within the loop if there exists a path >> + * within the control flow graph from this node to the loop's latch >> + * which does not pass through the loop's header. (If all paths to >> + * the latch pass through the loop header, then the node is contained >> + * within an outer-nested loop but not within this loop.) >> + * >> + * Note that if a candidate's successor is in the loop and the successor >> + * is not the loop header, then the candidate itself is also in the loop. >> + * If none of the successors of a candidate are in the loop, then the >> + * candidate itself is not in the loop. >> + */ >> +static bool >> +in_loop_of_header_p (basic_block candidate, basic_block loop_header, >> + basic_block loop_latch, bool start_of_recursion, >> + ladder_rung_p lower_steps) > > > This is another function that makes me wonder why it's so complicated. If > you want to know whether a basic block is inside a loop, why is looking for > it in the set returned by get_loop_body insufficient, or iterating backwards > from the block's loop_father? And even so I think there are probably better > approaches to writing this using worklists or existing helper functions, as > mentioned above. This can again be answered by using candidate->loop_father, loop_header->loop_father and flow_loop_nested_p. You don't even need to know the latch. +static vec<basic_block> +recursively_get_loop_blocks (basic_block candidate, vec<basic_block> results, + basic_block loop_header, basic_block loop_latch) +{ +/* Return a vector containing all of the blocks contained within the + * loop identified by loop_ptr. + */ +static vec<basic_block> +get_loop_blocks (struct loop *loop_ptr) +{ this is just get_loop_body ()? >> + return false; /* None of the successors was in loop >> */ > > > Move comment before statement. > >> +/* Return true iff block is an element of the block_set vector. >> + */ >> +static bool >> +in_block_set_p (basic_block block, vec<basic_block> block_set) >> +{ >> + basic_block bb; >> + unsigned int u; >> + FOR_EACH_VEC_ELT (block_set, u, bb) >> + { >> + if (bb == block) >> + return true; >> + } >> + return false; >> +} > > > You probably want to use (s)bitmaps so you don't have to do this potentially > expensive iteration. > >> + >> +/* Return a vector containing all of the edges that exit the loop >> + * represented by the loop_blocks vector. >> + */ >> +static vec<edge> >> +get_exit_edges_from_loop_blocks (vec<basic_block> loop_blocks) { > > > How does this one differ from the existing get_loop_exit_edges? Ah, I see > this: > >> + /* When zero_partial_loop_frequencies is invoked, the *loop_ptr >> + * object is not entirely coherent, so the library service >> + * get_loop_exit_edges (loop_p) cannot be called from this context. > > > Just how incoherent is the information? Is that fixable? I assume this is > the same reason why you have get_loop_blocks and are using it instead of > get_loop_body? That duplication is somewhat unfortunate and it would be nice > to avoid it. I think this is the core point that needs to be addressed > first. If this is unavoidable (and I really hope not), then it needs to be > clearly documented which set of functions should be called under which > circumstances. Just compute the sets before you mess with the loop then? Also I don't see why changing frequencies makes any of the CFG related functions not function. Richard. >> + if (!in_block_set_p (edge_dest, loop_blocks)) >> + { >> + results.safe_push (successor); >> + } > > > No braces around single statements. Please fix throughout. > >> + * Zero all frequencies for all blocks contained within the loop >> + * represented by loop_ptr which are reachable from block without >> + * passing through the block header. If block is not within the loop, >> + * this has no effect. The behavior is as outlined by the following > > > Lose the tab character in here. > >> +static void >> +recursively_increment_frequency (struct loop *loop_ptr, vec<edge> >> exit_edges, >> + ladder_rung_p ladder_rung, >> + edge incoming_edge, >> + int frequency_increment) > > > Some of these would probably fit on one line which would make this more > compact. > >> +#ifdef ENABLE_CHECKING > > > We're getting rid of this #ifdef. Define unconditionally, and use > flag_checking to determine whether to do the verification (if it is safe, > see comment above - otherwise just make this a DEBUG_FUNCTION). > >> +/** >> + * Issue a fatal error message and abort program execution. >> + */ >> +static void >> +internal (const char *msg) >> +{ >> + fprintf (stderr, "Fatal internal error: %s\n", msg); >> + exit (-1); >> +} > > > Unnecessary, we have fatal_error. > >> + * The integrity check is problematic due to round-off errors. Though > > > Another tab character. > >> @@ -1101,7 +1649,7 @@ can_duplicate_loop_p (const struct loop *loop) >> is redistributed evenly to the remaining edges coming from E->src. >> */ >> >> static void >> -set_zero_probability (edge e) >> +set_zero_probability (struct loop *loop_ptr, edge e) > > > Arguments should be documented in the function comment. > >> + * KELVIN_TODO: > > > Just "TODO" if at all. > > > Bernd