Hi Honza and All,
After more checks, I'm thinking these patches may still be useful.
For patch 1:
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555871.html
This patch recalculates the loop's BB-count and could correct
some BB-count mismatch for loops which has a single exit.
From the test result, we could say it reduce mismatched BB-counts
slightly.
For patch 2:
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555872.html
I updated as below:
It reset the loop's probability when the loop count becomes
unrealistically
small. In theory, it seems this would be the right direction to do
this.
Bootstrap/regtest on powerpc64le with no new regressions. I'm thinking
if
this is acceptable for trunk?
BR,
Jiufu Guo
Subject: Reset edge probability and BB-count for peeled/unrolled loop
This patch fix handles the case where unrolling in an unreliable count
number can cause a loop to no longer look hot and therefore not get
aligned.
This patch scale by profile_probability::likely () if unrolled count
gets
unrealistically small. And this patch fixes the COUNT/PROB of peeled
loop.
gcc/ChangeLog:
2021-07-01 Jiufu Guo <guoji...@linux.ibm.com>
Pat Haugen <pthau...@us.ibm.com>
PR rtl-optimization/68212
* cfgloopmanip.c (duplicate_loop_to_header_edge): Reset probablity
of unrolled/peeled loop.
testsuite/ChangeLog:
2021-07-01 Jiufu Guo <guoji...@linux.ibm.com>
Pat Haugen <pthau...@us.ibm.com>
PR rtl-optimization/68212
* gcc.dg/pr68212.c: New test.
---
gcc/cfgloopmanip.c | 20 ++++++++++++++++++--
gcc/testsuite/gcc.dg/pr68212.c | 13 +++++++++++++
2 files changed, 31 insertions(+), 2 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/pr68212.c
diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
index 4a9ab74642c..29d858c878a 100644
--- a/gcc/cfgloopmanip.c
+++ b/gcc/cfgloopmanip.c
@@ -1258,14 +1258,30 @@ duplicate_loop_to_header_edge (class loop *loop,
edge e,
/* If original loop is executed COUNT_IN times, the unrolled
loop will account SCALE_MAIN_DEN times. */
scale_main = count_in.probability_in (scale_main_den);
+
+ /* If we are guessing at the number of iterations and count_in
+ becomes unrealistically small, reset probability. */
+ if (!(count_in.reliable_p () || loop->any_estimate))
+ {
+ profile_count new_count_in = count_in.apply_probability
(scale_main);
+ profile_count preheader_count = loop_preheader_edge
(loop)->count ();
+ if (new_count_in.apply_scale (1, 10) < preheader_count)
+ scale_main = profile_probability::likely ();
+ }
+
scale_act = scale_main * prob_pass_main;
}
else
{
+ profile_count new_loop_count;
profile_count preheader_count = e->count ();
- for (i = 0; i < ndupl; i++)
- scale_main = scale_main * scale_step[i];
scale_act = preheader_count.probability_in (count_in);
+ /* Compute final preheader count after peeling NDUPL copies. */
+ for (i = 0; i < ndupl; i++)
+ preheader_count = preheader_count.apply_probability
(scale_step[i]);
+ /* Subtract out exit(s) from peeled copies. */
+ new_loop_count = count_in - (e->count () - preheader_count);
+ scale_main = new_loop_count.probability_in (count_in);
}
}
diff --git a/gcc/testsuite/gcc.dg/pr68212.c
b/gcc/testsuite/gcc.dg/pr68212.c
new file mode 100644
index 00000000000..e0cf71d5202
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr68212.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-tree-vectorize -funroll-loops --param
max-unroll-times=4 -fdump-rtl-alignments" } */
+
+void foo(long int *a, long int *b, long int n)
+{
+ long int i;
+
+ for (i = 0; i < n; i++)
+ a[i] = *b;
+}
+
+/* { dg-final { scan-rtl-dump-times "internal loop alignment added" 1
"alignments"} } */
+
--
2.17.1
On 2021-06-18 16:24, guojiufu via Gcc-patches wrote:
On 2021-06-15 12:57, guojiufu via Gcc-patches wrote:
On 2021-06-14 17:16, Jan Hubicka wrote:
On 5/6/2021 8:36 PM, guojiufu via Gcc-patches wrote:
> Gentle ping.
>
> Original message:
> https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555871.html
I think you need a more aggressive ping :-)
OK for the trunk. Sorry for the long delay. I kept hoping someone
else
would step in and look at it.
Sorry, the patch was on my todo list to think through for a while :(
It seems to me that both old and new code needs bit more work. First
the exit loop frequency is set to
prob = profile_probability::always ().apply_scale (1, new_est_niter
+ 1);
which is only correct if the estimated number of iterations is
accurate.
If we do not have profile feedback and trip count is not known
precisely
in most cases it won't be. We estimate loops to iterate about 3
times
and then niter_for_unrolled_loop will apply the capping to 5
iterations
that is completely arbitrary.
Forcing exit probability to precise may then disable futher loop
optimizations since after the change we will think we know the loop
iterates 5 times and thus it is not worthy for loop opt (which is
quite
oposite with the fact that we are just unrolling it thinking it is
hot).
Thanks, understand your concern, both new and old code are assuming
the
the number of iterations is accurate.
Maybe we could add code to reset exit probability for the case
where "!count_in.reliable_p ()".
Old code does
1) scale body down so only one iteration is done
2) set exit edge probability to be 1/(new_est_iter+1)
precisely
3) scale up accoring to the 1/new_nonexit_prob
which would be correct if the nonexit probability was updated to
1-exit_probability but that does not seem to happen.
New code does
Yes, this is intended: we know that the enter-count should be
equal to the exit-count of one loop, and then the
"loop-body-count * exit-probability = exit-count".
Also, the entry count of the loop would not be changed before and
after
one optimization (or slightly change,e.g. peeling count).
Based on this, we could adjust the loop body count according to
exit-count (or say enter-count) and exit-probability, when the
exit-probability is easy to estimate.
1) give up when there are multiple exits.
I wonder how common this is - we do outer loop vectorizaiton
Hi Honza, and guys:
I just had a statistic for bootstrap/test and spec2017 build and find
there are ~1700 times of single loops are hit this code; in spec2017
build,
it hits 226 single-exit loops, and multi-exit loops are not hit.
Had a test with profile-report to see "mismatch count', with these
patches
we may say the "mismatch count' is mitigated slightly, but not very
aggressive:
150 mismatch counts are reduced.
But 119 mismatch counts are increased.
Any comments about this patch? Is it acceptable for the trunk? Thanks.
BR,
Jiufu Guo.
The computation in the new code is based on a single exit. This is
also a requirement of old code, and it would be true when run to here.
2) adjust loop body count according to the exit
3) updat profile of BB after the exit edge.
Why do you need:
+ if (current_ir_type () != IR_GIMPLE)
+ update_br_prob_note (exit->src);
It is tree_transform_and_unroll_loop, so I think we should always
have
IR_GIMPLE?
These two lines are added to "recompute_loop_frequencies" which can be
used
in rtl, like the second patch of this:
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555872.html
Oh, maybe these two lines code would be put to
tree_transform_and_unroll_loop
instead of common code recompute_loop_frequencies.
Thanks a lot for the review in your busy time!
BR.
Jiufu Guo
Honza
jeff