https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782
--- Comment #6 from Jan Hubicka <hubicka at gcc dot gnu.org> --- I am sorry for getting back to this again late. This stage1 we spent some time with Martin improving the ipa-cp profile updating and looked again into the realism of the profile. Also recently the codegen has improved somewhat due to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103227 and due to modref propagation. I still believe for trunk GCC we should not have patch that intentionally makes profile unrealistic just to make IRA work better by accident since it does not seem to help anything real world except this somewhat odd benchmark. So I wonder if we can make profile to work better for IRA without actually making it unrealistic and tampering with ipa-cp cloning heuristics I added code that compares guessed profile with feedback https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585599.html and also fixed/improved code to dump stats about profile updates https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585578.html This gives bit more handle on how realistic the profile is. Answer is that not very in general, but at least for basic blocks containing calls it is not bad (we guess 0.9 while relity is 0.999). I am not sure how much better we can do statically since this is such a special case of backtracking. Last week we also noticed that with -Ofast we inline the newly produced clones together which makes IRA job a lot harder. This is done by -finline-functions-called-once and we tend to inline blocks of 2 or 3 clones leading to 18 or 27 nested loops in each. Simply disabling this optimization gets another performance hit. I filled in PR https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103454 and I think we could teach the inliner to not inline functions called once in large loop depths and restrict the large functions growths here since there are multiple benchmarks that now degrade on this. Worse yet, the heuristics for inlininig functions called once is not very smart and it depends on the order of cgrpah_nodes in the linked list which is bit random. I wonder how the situation looks on AArch64?