On 03/26/14 17:44, Teresa Johnson wrote:
Recently I discovered that the profile updates being performed by jump
threading were incorrect in many cases, particularly in the case where
the threading path contains a joiner. Some of the duplicated
blocks/edges were not getting any counts, leading to incorrect
function splitting and other downstream optimizations, and there were
other insanities as well. After making a few attempts to fix the
handling I ended up completely redesigning the profile update code,
removing a few places throughout the code where it was attempting to
do some updates.

The biggest complication (see the large comment and example above the
new routine compute_path_counts) is that we duplicate a conditional
jump in the joiner case, possibly multiple times for multiple jump
thread paths through that joiner, and it isn't trivial to figure out
what probability to assign each of the duplicated successor edges (and
the original after threading). Each jump thread path may need to have
a different probability of staying on path through the joiner in order
to keep the counts going out of the threading path sane.

The patch below was bootstrapped and tested on
x86_64-unknown-linux-gnu, and also tested with a profiledbootstrap. I
additionally tested with cpu2006, confirming that the amount of
resulting cycle samples in the split cold sections reduced, and
through manual inspection that many different cases were now correct.
I also measured performance with cpu2006, running each benchmark
multiple times on a Westmere and see some speedups (453.povray 1-2%,
403.gcc 1-1.5%, and noisy but positive speedups in 471.omnetpp and
483.xalancbmk).

Looks like my mailer is corrupting the spacing, which makes it harder
to look at the CFG examples in the big header comment block I added.
So I have also included the patch as an attachment.

Ok for stage 1?

Thanks,
Teresa

  2014-03-26  Teresa Johnson  <tejohn...@google.com>

         * tree-ssa-threadupdate.c (struct ssa_local_info_t): New
         duplicate_blocks bitmap.
         (remove_ctrl_stmt_and_useless_edges): Ditto.
         (create_block_for_threading): Ditto.
         (compute_path_counts): New function.
         (update_profile): Ditto.
         (deduce_freq): Ditto.
         (recompute_probabilities): Ditto.
         (update_joiner_offpath_counts): Ditto.
         (ssa_fix_duplicate_block_edges): Update profile info.
         (ssa_create_duplicates): Pass new parameter.
         (ssa_redirect_edges): Remove old profile update.
         (thread_block_1): New duplicate_blocks bitmap,
         remove old profile update.
         (thread_single_edge): Pass new parameter.
First off, sorry this took so long to get reviewed.

Most of what's going on in here is similar to something I sketched out, but never coded up a while back -- with the significant difference that you're handling joiner blocks as well.

Everything looks to be well thought through and documented in the code at a level I wish existed throughout GCC.

The only thing I see missing is regression tests. I don't think you need to do anything huge here, but it ought to be possible to set up relatively simple cases which show the probabilities/counts being updated properly.

Otherwise it looks excellent. It's pre-approved once you've added some kind of testing and fixed the nits noted below.


+   In the aboe example, after all jump threading is complete, we will
s/aboe/above/


+  struct el *next, *el;
+  bitmap in_edge_srcs = BITMAP_ALLOC (NULL);
+  for (el = rd->incoming_edges; el; el = next)
+    {
+      next = el->next;
+      bitmap_set_bit (in_edge_srcs, el->e->src->index);
+    }
Please add vertical whitespace after this loop, but before declaring variables for the next loop.

Jeff

Reply via email to