From: Dhruv Chawla <dhr...@nvidia.com>

For reasons explained in the patch, this patch prevents the loss of profile
information when inlining occurs in the profiled binary but not in the
auto-profile pass as a decision. As an example, for this code:

#define TRIP 1000000000

#ifdef DO_NOINLINE
# define INLINE __attribute__((noinline))
#else
# define INLINE __attribute__((always_inline))
#endif

INLINE int baz(int x, int y, int z) {
    if (x < TRIP / 4) {
        return y + z * 8;
    } else {
        return y * z / 2;
    }
}

__attribute__((noinline, noipa, optnone))
int passthrough(int x, int y, int z) {
    return baz(x, y, z);
}

int main() {
    for (int i = 0; i < TRIP; i++) {
        passthrough(i, i + 1, i + 2);
    }
}

This test case is first compiled without -DDO_NOINLINE, then the
resulting binary is profiled and the profile fed back while compiling
with -DDO_NOINLINE. This results in baz having an inline callsite in
passthrough in the GCOV but no inlining in the FDO binary.

Compiling this with and without the patch gives the following .afdo dumps:

- With the patch:

__attribute__((noinline))
int baz (int x, int y, int z)
{
  int _1;
  int _2;
  int _3;
  int _7;
  int _8;

  <bb 2> [count: 534583]:
  if (x_4(D) <= 249999999)
    goto <bb 3>; [100.00%]
  else
    goto <bb 4>; [0.00%]

  <bb 3> [count: 534583]:
  _1 = z_6(D) * 8;
  _8 = _1 + y_5(D);
  goto <bb 5>; [100.00%]

  <bb 4> [count: 0]:
  _2 = y_5(D) * z_6(D);
  _7 = _2 / 2;

  <bb 5> [count: 534583]:
  # _3 = PHI <_8(3), _7(4)>
  return _3;

}

- Without the patch:

__attribute__((noinline))
int baz (int x, int y, int z)
{
  int _1;
  int _2;
  int _3;
  int _7;
  int _8;

  <bb 2> [local count: 1073741824]:
  if (x_4(D) <= 249999999)
    goto <bb 3>; [50.00%]
  else
    goto <bb 4>; [50.00%]

  <bb 3> [local count: 536870912]:
  _1 = z_6(D) * 8;
  _8 = _1 + y_5(D);
  goto <bb 5>; [100.00%]

  <bb 4> [local count: 536870912]:
  _2 = y_5(D) * z_6(D);
  _7 = _2 / 2;

  <bb 5> [local count: 1073741824]:
  # _3 = PHI <_8(3), _7(4)>
  return _3;

}

Thus the profile counts are lost in this example, without the patch.

While developing this patch, a few other points also came up:

- Annotation, merging and inlining form a messy set of dependencies in
  the auto-profile pass. The order that functions get annotated in
  affects the decisions that the inliner makes, but the order of
  visiting them is effectively random due to the use of
  FOR_EACH_FUNCTION.

- The main issue is that annotation is performed after inlining. This is
  meant to more accurately mirror the hot path in the profiled binary,
  however there is no guarantee of this because of the randomness in the
  order of visitation.

- Consider the following example:

  int foo () { <...> }
  int bar_1 () { <...> foo (); <..> }
  int bar_2 () { <...> foo (); <..> }
  int bar_3 () { <...> foo (); <..> }

  If foo was always inlined in all three bar_<n> functions, the profile
  information will contain inline callsites for all bar_<n> functions.
  There will be no separate profile information for foo in the GCOV file.
  If auto-profile visits them in the order bar_1 -> foo -> bar_2 ->
  bar_3, it is possible that inlining could fail in bar_1 because foo
  would not have any profile counts associated with it. If foo was
  visited first, then that decision could change. This non-determinism
  raises the question of splitting out:

  1. Merging inline callsites into outline copies
  2. Annotating functions
  3. Inlining callsites

  As separate phases in auto-profile, where each effectively executes as a
  sub-pass. As modification of the cgraph is only done in 3., the order of
  visiting functions, at least in 1. and 2., should not matter. Does this
  sound okay?

Splitting out inlining as its own phase also means that it can
eventually be handed off to ipa-inline to handle, thus making
auto-profile independent of early inline. This will simplify the code a
fair bit. Is this a good direction to go in?

Bootstrapped and regtested on aarch64-linux-gnu.

Dhruv Chawla (1):
  [RFC][AutoFDO] Propagate information to outline copies if not inlined

 gcc/auto-profile.cc | 72 +++++++++++++++++++++++++++++++++++++++------
 1 file changed, 63 insertions(+), 9 deletions(-)

-- 
2.44.0

Reply via email to