https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782

--- Comment #6 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
I am sorry for getting back to this again late.  This stage1 we spent some time
with Martin improving the ipa-cp profile updating and looked again into the
realism of the profile. Also recently the codegen has improved somewhat due to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103227 and due to modref
propagation.

I still believe for trunk GCC we should not have patch that intentionally makes
profile unrealistic just to make IRA work better by accident since it does not
seem to help anything real world except this somewhat odd benchmark. So I
wonder if we can make profile to work better for IRA without actually making it
unrealistic and tampering with ipa-cp cloning heuristics

I added code that compares guessed profile with feedback
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585599.html
and also fixed/improved code to dump stats about profile updates
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585578.html

This gives bit more handle on how realistic the profile is.  Answer is that not
very in general, but at least for basic blocks containing calls it is not bad
(we guess 0.9 while relity is 0.999).
I am not sure how much better we can do statically since this is such a special
case of backtracking.

Last week we also noticed that with -Ofast we inline the newly produced clones
together which makes IRA job a lot harder.  This is done by
-finline-functions-called-once and we tend to inline blocks of 2 or 3 clones
leading to 18 or 27 nested loops in each.  Simply disabling this optimization
gets another performance hit.

I filled in PR https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103454 and I think
we could teach the inliner to not inline functions called once in large loop
depths and restrict the large functions growths here since there are multiple
benchmarks that now degrade on this.

Worse yet, the heuristics for inlininig functions called once is not very smart
and it depends on the order of cgrpah_nodes in the linked list which is bit
random.

I wonder how the situation looks on AArch64?

Reply via email to