On Mon, Nov 9, 2015 at 6:35 PM, Bernd Schmidt <bschm...@redhat.com> wrote:
> On 11/07/2015 03:44 PM, Kelvin Nilsen wrote:
>>
>> This is a draft patch to partially address the concerns described in
>> bugzilla problem report
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68212). The patch is
>> incomplete in the sense that there are some known shortcomings with
>> nested loops which I am still working on.  I am sending this out for
>> comments at this time because we would like these patches to be
>> integrated into the GCC 6 release and want to begin responding to
>> community feedback as soon as possible in order to make the integration
>> possible.
>
>
> I'll mainly comment on style points right now. Your code generally looks
> mostly good but doesn't quite follow our guidelines. In terms of logic I'm
> sure there will be followup questions after the first round of points is
> addressed. Others might be better qualified to review the frequencies
> manipulation; for this first round I'm just assuming that the general idea
> is sound (but would appreciate input).
>
>> 1. Before a loop body is unpeeled into a pre-header location, we
>> temporarily adjust the loop body frequencies to represent the values
>> appropriate for the context into which the loop body is to be
>> copied.
>>
>> 2. After unrolling the loop body (by replicating the loop body (N-1)
>> times within the loop), we recompute all frequencies associated with
>> blocks contained within the loop.
>
>
> If these are independent from each other it might be better to split up the
> patch.
>
>>          (check_loop_frequency_integrity): new helper routine
>>          (set_zero_probability): added another parameter
>>          (duplicate_loop_to_header_edge): Add code to recompute loop
>> body frequencies after blocks are replicated (unrolled) into the loop
>> body. Introduce certain help routines because existing infrastructure
>> routines are not reliable during typical executions of
>> duplicate_loop_to_header_edge().
>
>
> Please review our guidelines how to write ChangeLogs - capitalize,
> punctuate, and document only the what, not the why. Also, linewrap manually.
>
>>     opt_info_start_duplication (opt_info);
>> +
>>     ok = duplicate_loop_to_header_edge (loop, loop_latch_edge (loop),
>
>
> Please make sure you don't change whitespace unnecessarily. There are a few
> other occurrences in the patch, and also cases where you seem to be adding
> spaces to otherwise blank lines.
>
>> @@ -1015,14 +1041,44 @@ unroll_loop_runtime_iterations (struct loop *loop)
>>     bitmap_clear_bit (wont_exit, may_exit_copy);
>>     opt_info_start_duplication (opt_info);
>>
>> +  {
>> +    /* Recompute the loop body frequencies. */
>> +    zero_loop_frequencies (loop);
>
>
> No reason to start a braced block here.
>
>> +    /* Scale the incoming frequencies according to the heuristic that
>> +     * the loop frequency is the incoming edge frequency divided by
>> +     * 0.09.  This heuristic applies only to loops that iterate over a
>> +     * run-time value that is not known at compile time.  Note that
>> +     * 1/.09 equals 11.1111.  We'll use integer arithmetic on ten
>> +     * thousandths, and then divide by 10,000 after we've "rounded".
>> +     */
>
>
> Please examine the comment style in gcc - no asterisks to start new lines,
> and comment terminators don't go on their own line.
>
>> +    sum_incoming_frequencies *= 111111;  /* multiply by 11.1111 */
>> +    sum_incoming_frequencies += 5000;    /* round by adding 0.5 */
>> +    sum_incoming_frequencies /= 10000;  /* convert ten thousandths
>> +                                           to ones
>> +                                        */
>
>
> These comments could also be improved, but really they should just be
> removed since they're pretty obvious and redundant with the one before.
>
>> +/* Define ENABLE_CHECKING to enforce the following run-time checks.
>
>
> "With checking enabled, the following run-time checks are performed:"
>
>> + * This may report false-positive errors due to round-off errors.
>
>
> That doesn't sound good as it could lead to bootstrap failures when checking
> is enabled.
>
>> @@ -44,6 +55,543 @@ static void fix_loop_placements (struct loop *, bo
>>   static bool fix_bb_placement (basic_block);
>>   static void fix_bb_placements (basic_block, bool *, bitmap);
>>
>> +/*
>> + * Return true iff block is considered to reside within the loop
>> + * represented by loop_ptr.
>> + */
>
>
> Arguments are capitalized in function comments.
>
>> +bool
>> +in_loop_p (basic_block block, struct loop *loop_ptr)
>> +{
>> +  basic_block *bbs = get_loop_body (loop_ptr);
>> +  bool result = false;
>> +
>> +  for (unsigned int i = 0; i < loop_ptr->num_nodes; i++)
>> +    {
>> +      if (bbs[i] == block)
>> +       result = true;
>> +    }
>
>
> I think something that starts with bb->loop_father and iterates outwards
> would be more efficient.

Well, either bb->loop_father == loop_ptr or that with ||
flow_loop_nested_p (loop_ptr, bb->loop_father).

>> +/* A list of block_ladder_rung structs is used to keep track of all the
>> + * blocks visited in a depth-first recursive traversal of a control-flow
>> + * graph.  This list is used to detect and prevent attempts to revisit
>> + * a block that is already being visited in the recursive traversal.
>> + */
>> +typedef struct block_ladder_rung {
>> +  basic_block block;
>> +  struct block_ladder_rung *lower_rung;
>> +} *ladder_rung_p;
>
>
> I was going to suggest a vec rather than manually dealing with lists, but it
> looks like you're using these in a rather unusual way in a recursive walk.
> I'm not actually sure that what you're doing entirely prevents revisiting a
> block, it only does so for a single recursive call-chain, but two separate
> recursive calls from the same level could still visit the same blocks. I'm
> not sure this is intended (the usual approach would be to use a (s)bitmap of
> visited blocks). More comments on the whole recursion scheme below.

You can also use the BB_VISITED flag.  Or dfs_enumerate_from ()

>> +/* Return true iff an_edge represents the same source and destination
>> + * blocks as another_edge.
>> + */
>> +static bool
>> +same_edge_p (edge an_edge, edge another_edge)
>> +{
>> +  return ((an_edge->src == another_edge->src)
>> +         && (an_edge->dest == another_edge->dest));
>> +}
>
>
> Formatting aside (extra parentheses), I wonder why you can't just test edges
> for pointer equality? Does anything make duplicate edges?

We don't allow duplicate edges apart from edges with different flags (an EH
edge may appear alongside a FALLTRHU one for example).

Some more general comments.  The patch lacks a testcase (or a few).
We do have general checking of frequencies in cfg.c:check_bb_profile which
will cause dump files (of the unrolling dump for example) to contain
stuff like "Invalid sum of incoming frequencies..."  A testcase would be
sth that has them before but not after the patches.

>> +static void
>> +recursively_zero_frequency (struct loop *loop_ptr, vec<edge> exit_edges,
>> +                           ladder_rung_p ladder_rung,
>> +                           edge incoming_edge)
>> +{
>> +  if (incoming_edge->dest == loop_ptr->header)
>> +    return;
>> +  else if (in_edge_set_p (incoming_edge, exit_edges))
>> +    return;
>> +  else if (in_call_chain_p (incoming_edge, ladder_rung))
>> +    return;
>> +  else
>> +    {
>> +      struct block_ladder_rung a_rung;
>> +      basic_block block = incoming_edge->dest;
>> +
>> +      a_rung.block = block;
>> +      a_rung.lower_rung = ladder_rung;
>> +      block->frequency = 0;
>> +
>> +      edge_iterator ei;
>> +      edge successor;
>> +      FOR_EACH_EDGE (successor, ei, block->succs)
>> +       {
>> +         recursively_zero_frequency (loop_ptr, exit_edges,
>> +                                     &a_rung, successor);
>> +       }
>> +    }
>> +}
>
>
> Does this really have to be recursive, and this complicated? I think
> something with a worklist and maybe a bitmap to indicate visited blocks
> would be considerably easier to understand, and the recursion could be a
> problem with very large testcases.
>
> You might want to check whether something like dfs_enumerate_from or other
> functions in cfganal would be helpful.

Simply iterating over all BBs returned from get_loop_body would be sufficient.

>> +static bool
>> +recursion_detected_p (basic_block candidate, ladder_rung_p lower_steps) {
>
>
> Don't place the open brace on the same line as the declaration.
>
>> +/* Return true iff candidate is contained within the loop represented
>> + * by loop_header and loop_latch.
>> + *
>> + * We consider the block to be within the loop if there exists a path
>> + * within the control flow graph from this node to the loop's latch
>> + * which does not pass through the loop's header.  (If all paths to
>> + * the latch pass through the loop header, then the node is contained
>> + * within an outer-nested loop but not within this loop.)
>> + *
>> + * Note that if a candidate's successor is in the loop and the successor
>> + * is not the loop header, then the candidate itself is also in the loop.
>> + * If none of the successors of a candidate are in the loop, then the
>> + * candidate itself is not in the loop.
>> + */
>> +static bool
>> +in_loop_of_header_p (basic_block candidate, basic_block loop_header,
>> +                     basic_block loop_latch, bool start_of_recursion,
>> +                     ladder_rung_p lower_steps)
>
>
> This is another function that makes me wonder why it's so complicated. If
> you want to know whether a basic block is inside a loop, why is looking for
> it in the set returned by get_loop_body insufficient, or iterating backwards
> from the block's loop_father? And even so I think there are probably better
> approaches to writing this using worklists or existing helper functions, as
> mentioned above.

This can again be answered by using candidate->loop_father,
loop_header->loop_father and flow_loop_nested_p.  You don't even
need to know the latch.

+static vec<basic_block>
+recursively_get_loop_blocks (basic_block candidate, vec<basic_block> results,
+                            basic_block loop_header, basic_block loop_latch)
+{

+/* Return a vector containing all of the blocks contained within the
+ * loop identified by loop_ptr.
+ */
+static vec<basic_block>
+get_loop_blocks (struct loop *loop_ptr)
+{

this is just get_loop_body ()?

>> +      return false;              /* None of the successors was in loop
>> */
>
>
> Move comment before statement.
>
>> +/* Return true iff block is an element of the block_set vector.
>> + */
>> +static bool
>> +in_block_set_p (basic_block block, vec<basic_block> block_set)
>> +{
>> +  basic_block bb;
>> +  unsigned int u;
>> +  FOR_EACH_VEC_ELT (block_set, u, bb)
>> +    {
>> +      if (bb == block)
>> +       return true;
>> +    }
>> +  return false;
>> +}
>
>
> You probably want to use (s)bitmaps so you don't have to do this potentially
> expensive iteration.
>
>> +
>> +/* Return a vector containing all of the edges that exit the loop
>> + * represented by the loop_blocks vector.
>> + */
>> +static vec<edge>
>> +get_exit_edges_from_loop_blocks (vec<basic_block> loop_blocks) {
>
>
> How does this one differ from the existing get_loop_exit_edges? Ah, I see
> this:
>
>> +  /* When zero_partial_loop_frequencies is invoked, the *loop_ptr
>> +   * object is not entirely coherent, so the library service
>> +   * get_loop_exit_edges (loop_p) cannot be called from this context.
>
>
> Just how incoherent is the information? Is that fixable? I assume this is
> the same reason why you have get_loop_blocks and are using it instead of
> get_loop_body? That duplication is somewhat unfortunate and it would be nice
> to avoid it. I think this is the core point that needs to be addressed
> first. If this is unavoidable (and I really hope not), then it needs to be
> clearly documented which set of functions should be called under which
> circumstances.

Just compute the sets before you mess with the loop then?  Also I don't
see why changing frequencies makes any of the CFG related functions
not function.

Richard.

>> +         if (!in_block_set_p (edge_dest, loop_blocks))
>> +           {
>> +             results.safe_push (successor);
>> +           }
>
>
> No braces around single statements. Please fix throughout.
>
>> + * Zero all frequencies for all blocks contained within the loop
>> + * represented by loop_ptr which are reachable from block without
>> + * passing through the block header.  If block is not within the loop,
>> + * this has no effect. The behavior is as outlined by the following
>
>
> Lose the tab character in here.
>
>> +static void
>> +recursively_increment_frequency (struct loop *loop_ptr, vec<edge>
>> exit_edges,
>> +                                ladder_rung_p ladder_rung,
>> +                                edge incoming_edge,
>> +                                int frequency_increment)
>
>
> Some of these would probably fit on one line which would make this more
> compact.
>
>> +#ifdef ENABLE_CHECKING
>
>
> We're getting rid of this #ifdef. Define unconditionally, and use
> flag_checking to determine whether to do the verification (if it is safe,
> see comment above - otherwise just make this a DEBUG_FUNCTION).
>
>> +/**
>> + * Issue a fatal error message and abort program execution.
>> + */
>> +static void
>> +internal (const char *msg)
>> +{
>> +  fprintf (stderr, "Fatal internal error: %s\n", msg);
>> +  exit (-1);
>> +}
>
>
> Unnecessary, we have fatal_error.
>
>> + * The integrity check is problematic due to round-off errors. Though
>
>
> Another tab character.
>
>> @@ -1101,7 +1649,7 @@ can_duplicate_loop_p (const struct loop *loop)
>>      is redistributed evenly to the remaining edges coming from E->src.
>> */
>>
>>   static void
>> -set_zero_probability (edge e)
>> +set_zero_probability (struct loop *loop_ptr, edge e)
>
>
> Arguments should be documented in the function comment.
>
>> +   * KELVIN_TODO:
>
>
> Just "TODO" if at all.
>
>
> Bernd

Reply via email to