It appears to me that both loop invariant motion passes (tree/rtl) don't
look at basic block frequencies and will gladly hoist invariant code
from a cold block within a loop. This can impact performance by
executing (possibly costly) code that would otherwise not be executed,
adds another register that is live through the loop, and possibly
increases number of callee-saved registers used in the procedure (which
require save/restore) if there is a call in the loop. Following
simplified test was patterned after code observed in Boost library,
which apparently has a fair number of asserts sprinkled throughout.
volatile int x;
void assert_fail(int, char *, char *);
void foo(int *a, int n, int k) {
int i;
for(i=0; i<n; i++) {
if (__builtin_expect(x,0))
assert_fail(k / 5, "one", "two");
a[i] = k;
}
}
Compiling the above with -O2, tree-LIM (tree-ssa-loop-im.c) will yank
the 'k/5' expression out of the loop and rtl-LIM (loop-invariant.c) will
yank the address computations for the 2 strings out of the loop. The
default probability for __builtin_expect is 90%, but even adding --param
builtin-expect-probability=100 to the compile such that the block
containing the call looks to never be executed still results in
invariant hoisting.
A simple change like the following prevents the hoisting in the rtl
pass, I assume something similar could also be added to the tree pass.
But I'd like to hear others thoughts on whether this is an acceptable
approach/issue that should be fixed. The code below prevents motion if
the block containing it is executed <= 10% of the time the loop header
is. Is there a better approach there instead of another magic number?
Should it be based on PARAM_VALUE (BUILTIN_EXPECT_PROBABILITY)? Basing
on BUILTIN_EXPECT_PROBABILITY seems to make sense to me for the above
case, but what about the more general case where __builtin_expect()
isn't used and branch probabilities are guessed based on other heurstics?
Index: loop-invariant.c
===================================================================
--- loop-invariant.c (revision 216843)
+++ loop-invariant.c (working copy)
@@ -1000,6 +1000,11 @@ find_invariants_bb (basic_block bb, bool
{
rtx_insn *insn;
+
+ /* Don't move invariants from cold block inside the loop. */
+ if (bb->frequency <= (bb->loop_father->header->frequency / 10) )
+ return;
+
FOR_BB_INSNS (bb, insn)
{
if (!NONDEBUG_INSN_P (insn))
Thanks for any feedback,
Pat