https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116743

--- Comment #6 from Rama Malladi <rvmallad at amazon dot com> ---
I am trying to create a reproducer for this issue. Interim, I wanted to share
some stats I got from the MySQL build to highlight this issue w GCC 12.3.0 vs.
11.5.0.

Executable Size (B)        Baseline          AutoFDO
GCC 11.5.0                 1,577,584,040     1,622,046,224
GCC 12.3.0                 1,525,753,912     1,602,504,784

So, I got some inline stats from the compiler opt-report.

GCC 11.5.0                              Baseline        PGO
missed_not_inlinable - Total            1,716,533       1,745,229
max-inline-insns-auto limit reached        61,236          63,220
function body not available             1,052,113       1,061,064
can be overwritten at link time            94,640          97,968
function not inlinable                      6,808           6,799
large-stack-frame-growth limit reached     18,789          26,347
inline-unit-growth limit reached          327,025         338,244
call is unlikely and code size grow       100,979         103,536

GCC 12.3.0                              Baseline        PGO
missed_not_inlinable - Total            1,838,987       1,728,352
max-inline-insns-auto limit reached        59,480          61,981
function body not available               927,989         944,740
can be overwritten at link time            92,993          93,044
function not inlinable                      7,042           7,047
large-stack-frame-growth limit reached     18,629          18,483
inline-unit-growth limit reached          497,948         367,222
call is unlikely and code size grow       180,154         185,387

I tried to reason why "call is unlikely and code size would grow". Here is an
example function which got inlined in GCC 11.5.0 w/o and w PGO. But in GCC
12.3.0, it got inlined w/o PGO but not w PGO.

GCC 11.5.0:
w/o PGO:
optimized:  Inlined void mem_heap_free(mem_heap_t*)/8218 into bool
btr_pcur_t::restore_position(ulint, mtr_t*, ut::Location)/16707 which now has
time 186.710665 and size 255, net change of -5.

w PGO:
optimized:  Inlined void mem_heap_free(mem_heap_t*)/8218 into bool
btr_pcur_t::restore_position(ulint, mtr_t*, ut::Location)/16707 which now has
time 501561.786133 and size 227, net change of -6.

GCC 12.3.0:
w/o PGO:
optimized:  Inlined void mem_heap_free(mem_heap_t*)/8120 into bool
btr_pcur_t::restore_position(ulint, mtr_t*, ut::Location)/16600 which now has
time 186.710665 and size 255, net change of -5.

w PGO:
missed:   not inlinable: bool btr_pcur_t::restore_position(ulint, mtr_t*,
ut::Location)/16600 -> void mem_heap_free(mem_heap_t*)/8120, call is unlikely
and code size would grow

The inline optimizations get tossed by the AutoFDO count propagation algorithm
in that the inlining of a callee (w one edge) gets impacted by the perf stats
of the caller.

Reply via email to