On 09/29/14 08:19, Teresa Johnson wrote:
Just an update - I found some good test cases by compiling the
c-torture tests with profile feedback with and without my patch. But
in the cases I pulled out I saw that there were still a couple profile
or probability insanities introduced by jump threading (albeit far
less than before), so I wanted to investigate before I commit. I ran
out of time this week and will not get to this until I get back from
vacation the week after next.
Hi Jeff,
I finally had a chance to get back to this and look at the remaining
insanities in the new test cases I created. It turns out that there
were still a few issues in the case where there were guessed
frequencies and no profile counts. The two test cases I created do use
FDO, and the insanities in the routines with profile counts went away
with my patch. But the outlined copies of routines that were also
inlined into the main routine still had estimated frequencies, and
these still had a few issues.
The problem is that the profile updates are done incrementally as we
walk and update the paths in ssa_fix_duplicate_block_edges, including
the block and edge counts, the block frequencies and the
probabilities. This is very difficult to do when only operating on
frequencies since the edge frequencies are derived from the source
block frequency and the probability. Therefore, once the source block
frequency is updated, the edge frequency is also affected, and it is
really difficult to figure out what the update to the edge frequency
(essentially the probability) is using the same incremental update
approach. I was attempting to handle this with the routine
deduce_freq, for example, but this turned out to have issues for
certain types of paths. I tried a few other approaches, but they start
looking really ugly and I didn't want to add a parallel but different
algorithm in the case of no profile counts.
So by far the simplest approach was simply to take a snapshot of the
existing block and edge frequencies along the path before we start the
updates in ssa_fix_duplicate_block_edges, by copying them into the
profile count fields of those blocks and edges. Then the existing
algorithm operates the same as when we do have counts, and can
essentially operate incrementally on the edge frequencies since they
live in the count field of the edge and are no longer affected anytime
the source block is updated. Since the algorithm does update block
frequencies and probabilities as well (based on the count updates
performed), we can simply clear out these fake count fields at the end
of ssa_fix_duplicate_block_edges. This takes care of the remaining
insanities introduced by jump threading from these test cases. During
testing I also added in some checking to ensure that the count fields
for the whole routine were cleared properly to make sure the new
clear_counts_path was not missing anything (checking is a little too
heavyweight to add in normally).
New patch below (also attached since my mailer sometimes eats spaces).
The differences between the old patch and the new one:
- removed deduce_freq (which was my least favorite part of the patch
anyway!), and its call from recompute_probabilities, since it is no
longer necessary.
- two new routines freqs_to_counts_path and clear_counts_path, invoked
from ssa_fix_duplicate_block_edges.
- two new tests
Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk?
Thanks,
Teresa
gcc:
2014-09-29 Teresa Johnson <tejohn...@google.com>
* tree-ssa-threadupdate.c (struct ssa_local_info_t): New
duplicate_blocks bitmap.
(remove_ctrl_stmt_and_useless_edges): Ditto.
(create_block_for_threading): Ditto.
(compute_path_counts): New function.
(update_profile): Ditto.
(recompute_probabilities): Ditto.
(update_joiner_offpath_counts): Ditto.
(freqs_to_counts_path): Ditto.
(clear_counts_path): Ditto.
(ssa_fix_duplicate_block_edges): Update profile info.
(ssa_create_duplicates): Pass new parameter.
(ssa_redirect_edges): Remove old profile update.
(thread_block_1): New duplicate_blocks bitmap,
remove old profile update.
(thread_single_edge): Pass new parameter.
gcc/testsuite:
2014-09-29 Teresa Johnson <tejohn...@google.com>
* testsuite/gcc.dg/tree-prof/20050826-2.c: New test.
* testsuite/gcc.dg/tree-prof/cmpsf-1.c: Ditto.
Given I'd already been through this pretty thoroughly, I just gave this
a cursory review.
clear_counts_path needs a function comment. It's pretty obvious what
it's doing, but for completeness let's go ahead and get the obvious
comment in there.
With that fix, approved for the trunk. Thanks for taking the time to
sort out all these issues.
jeff