On 11/07/2015 03:44 PM, Kelvin Nilsen wrote:
This is a draft patch to partially address the concerns described in bugzilla problem report https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68212). The patch is incomplete in the sense that there are some known shortcomings with nested loops which I am still working on. I am sending this out for comments at this time because we would like these patches to be integrated into the GCC 6 release and want to begin responding to community feedback as soon as possible in order to make the integration possible.
I'll mainly comment on style points right now. Your code generally looks mostly good but doesn't quite follow our guidelines. In terms of logic I'm sure there will be followup questions after the first round of points is addressed. Others might be better qualified to review the frequencies manipulation; for this first round I'm just assuming that the general idea is sound (but would appreciate input).
1. Before a loop body is unpeeled into a pre-header location, we temporarily adjust the loop body frequencies to represent the values appropriate for the context into which the loop body is to be copied. 2. After unrolling the loop body (by replicating the loop body (N-1) times within the loop), we recompute all frequencies associated with blocks contained within the loop.
If these are independent from each other it might be better to split up the patch.
(check_loop_frequency_integrity): new helper routine (set_zero_probability): added another parameter (duplicate_loop_to_header_edge): Add code to recompute loop body frequencies after blocks are replicated (unrolled) into the loop body. Introduce certain help routines because existing infrastructure routines are not reliable during typical executions of duplicate_loop_to_header_edge().
Please review our guidelines how to write ChangeLogs - capitalize, punctuate, and document only the what, not the why. Also, linewrap manually.
opt_info_start_duplication (opt_info); + ok = duplicate_loop_to_header_edge (loop, loop_latch_edge (loop),
Please make sure you don't change whitespace unnecessarily. There are a few other occurrences in the patch, and also cases where you seem to be adding spaces to otherwise blank lines.
@@ -1015,14 +1041,44 @@ unroll_loop_runtime_iterations (struct loop *loop) bitmap_clear_bit (wont_exit, may_exit_copy); opt_info_start_duplication (opt_info); + { + /* Recompute the loop body frequencies. */ + zero_loop_frequencies (loop);
No reason to start a braced block here.
+ /* Scale the incoming frequencies according to the heuristic that + * the loop frequency is the incoming edge frequency divided by + * 0.09. This heuristic applies only to loops that iterate over a + * run-time value that is not known at compile time. Note that + * 1/.09 equals 11.1111. We'll use integer arithmetic on ten + * thousandths, and then divide by 10,000 after we've "rounded". + */
Please examine the comment style in gcc - no asterisks to start new lines, and comment terminators don't go on their own line.
+ sum_incoming_frequencies *= 111111; /* multiply by 11.1111 */ + sum_incoming_frequencies += 5000; /* round by adding 0.5 */ + sum_incoming_frequencies /= 10000; /* convert ten thousandths + to ones + */
These comments could also be improved, but really they should just be removed since they're pretty obvious and redundant with the one before.
+/* Define ENABLE_CHECKING to enforce the following run-time checks.
"With checking enabled, the following run-time checks are performed:"
+ * This may report false-positive errors due to round-off errors.
That doesn't sound good as it could lead to bootstrap failures when checking is enabled.
@@ -44,6 +55,543 @@ static void fix_loop_placements (struct loop *, bo static bool fix_bb_placement (basic_block); static void fix_bb_placements (basic_block, bool *, bitmap); +/* + * Return true iff block is considered to reside within the loop + * represented by loop_ptr. + */
Arguments are capitalized in function comments.
+bool +in_loop_p (basic_block block, struct loop *loop_ptr) +{ + basic_block *bbs = get_loop_body (loop_ptr); + bool result = false; + + for (unsigned int i = 0; i < loop_ptr->num_nodes; i++) + { + if (bbs[i] == block) + result = true; + }
I think something that starts with bb->loop_father and iterates outwards would be more efficient.
+/* A list of block_ladder_rung structs is used to keep track of all the + * blocks visited in a depth-first recursive traversal of a control-flow + * graph. This list is used to detect and prevent attempts to revisit + * a block that is already being visited in the recursive traversal. + */ +typedef struct block_ladder_rung { + basic_block block; + struct block_ladder_rung *lower_rung; +} *ladder_rung_p;
I was going to suggest a vec rather than manually dealing with lists, but it looks like you're using these in a rather unusual way in a recursive walk. I'm not actually sure that what you're doing entirely prevents revisiting a block, it only does so for a single recursive call-chain, but two separate recursive calls from the same level could still visit the same blocks. I'm not sure this is intended (the usual approach would be to use a (s)bitmap of visited blocks). More comments on the whole recursion scheme below.
+/* Return true iff an_edge represents the same source and destination + * blocks as another_edge. + */ +static bool +same_edge_p (edge an_edge, edge another_edge) +{ + return ((an_edge->src == another_edge->src) + && (an_edge->dest == another_edge->dest)); +}
Formatting aside (extra parentheses), I wonder why you can't just test edges for pointer equality? Does anything make duplicate edges?
+static void +recursively_zero_frequency (struct loop *loop_ptr, vec<edge> exit_edges, + ladder_rung_p ladder_rung, + edge incoming_edge) +{ + if (incoming_edge->dest == loop_ptr->header) + return; + else if (in_edge_set_p (incoming_edge, exit_edges)) + return; + else if (in_call_chain_p (incoming_edge, ladder_rung)) + return; + else + { + struct block_ladder_rung a_rung; + basic_block block = incoming_edge->dest; + + a_rung.block = block; + a_rung.lower_rung = ladder_rung; + block->frequency = 0; + + edge_iterator ei; + edge successor; + FOR_EACH_EDGE (successor, ei, block->succs) + { + recursively_zero_frequency (loop_ptr, exit_edges, + &a_rung, successor); + } + } +}
Does this really have to be recursive, and this complicated? I think something with a worklist and maybe a bitmap to indicate visited blocks would be considerably easier to understand, and the recursion could be a problem with very large testcases.
You might want to check whether something like dfs_enumerate_from or other functions in cfganal would be helpful.
+static bool +recursion_detected_p (basic_block candidate, ladder_rung_p lower_steps) {
Don't place the open brace on the same line as the declaration.
+/* Return true iff candidate is contained within the loop represented + * by loop_header and loop_latch. + * + * We consider the block to be within the loop if there exists a path + * within the control flow graph from this node to the loop's latch + * which does not pass through the loop's header. (If all paths to + * the latch pass through the loop header, then the node is contained + * within an outer-nested loop but not within this loop.) + * + * Note that if a candidate's successor is in the loop and the successor + * is not the loop header, then the candidate itself is also in the loop. + * If none of the successors of a candidate are in the loop, then the + * candidate itself is not in the loop. + */ +static bool +in_loop_of_header_p (basic_block candidate, basic_block loop_header, + basic_block loop_latch, bool start_of_recursion, + ladder_rung_p lower_steps)
This is another function that makes me wonder why it's so complicated. If you want to know whether a basic block is inside a loop, why is looking for it in the set returned by get_loop_body insufficient, or iterating backwards from the block's loop_father? And even so I think there are probably better approaches to writing this using worklists or existing helper functions, as mentioned above.
+ return false; /* None of the successors was in loop */
Move comment before statement.
+/* Return true iff block is an element of the block_set vector. + */ +static bool +in_block_set_p (basic_block block, vec<basic_block> block_set) +{ + basic_block bb; + unsigned int u; + FOR_EACH_VEC_ELT (block_set, u, bb) + { + if (bb == block) + return true; + } + return false; +}
You probably want to use (s)bitmaps so you don't have to do this potentially expensive iteration.
+ +/* Return a vector containing all of the edges that exit the loop + * represented by the loop_blocks vector. + */ +static vec<edge> +get_exit_edges_from_loop_blocks (vec<basic_block> loop_blocks) {
How does this one differ from the existing get_loop_exit_edges? Ah, I see this:
+ /* When zero_partial_loop_frequencies is invoked, the *loop_ptr + * object is not entirely coherent, so the library service + * get_loop_exit_edges (loop_p) cannot be called from this context.
Just how incoherent is the information? Is that fixable? I assume this is the same reason why you have get_loop_blocks and are using it instead of get_loop_body? That duplication is somewhat unfortunate and it would be nice to avoid it. I think this is the core point that needs to be addressed first. If this is unavoidable (and I really hope not), then it needs to be clearly documented which set of functions should be called under which circumstances.
+ if (!in_block_set_p (edge_dest, loop_blocks)) + { + results.safe_push (successor); + }
No braces around single statements. Please fix throughout.
+ * Zero all frequencies for all blocks contained within the loop + * represented by loop_ptr which are reachable from block without + * passing through the block header. If block is not within the loop, + * this has no effect. The behavior is as outlined by the following
Lose the tab character in here.
+static void +recursively_increment_frequency (struct loop *loop_ptr, vec<edge> exit_edges, + ladder_rung_p ladder_rung, + edge incoming_edge, + int frequency_increment)
Some of these would probably fit on one line which would make this more compact.
+#ifdef ENABLE_CHECKING
We're getting rid of this #ifdef. Define unconditionally, and use flag_checking to determine whether to do the verification (if it is safe, see comment above - otherwise just make this a DEBUG_FUNCTION).
+/** + * Issue a fatal error message and abort program execution. + */ +static void +internal (const char *msg) +{ + fprintf (stderr, "Fatal internal error: %s\n", msg); + exit (-1); +}
Unnecessary, we have fatal_error.
+ * The integrity check is problematic due to round-off errors. Though
Another tab character.
@@ -1101,7 +1649,7 @@ can_duplicate_loop_p (const struct loop *loop) is redistributed evenly to the remaining edges coming from E->src. */ static void -set_zero_probability (edge e) +set_zero_probability (struct loop *loop_ptr, edge e)
Arguments should be documented in the function comment.
+ * KELVIN_TODO:
Just "TODO" if at all. Bernd