https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863
--- Comment #41 from richard.sandiford at arm dot com --- "rguenther at suse dot de" <gcc-bugzi...@gcc.gnu.org> writes: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863 > > --- Comment #40 from rguenther at suse dot de <rguenther at suse dot de> --- > On Mon, 8 Feb 2021, rsandifo at gcc dot gnu.org wrote: > >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98863 >> >> --- Comment #39 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot >> gnu.org> --- >> Just to give an update on this: I have a patch that reduces the >> amount of memory consumed by fwprop so that it no longer seems >> to be outlier. However, it involves doing more bitmap operations. >> In this testcase we have a larger number of registers that >> seem to be live but unused across a large region of code, >> so bitmap ANDs with the live in sets are expensive and hit >> the worst-case O(nblocksĂ—nregisters). I'm still trying to find >> a way of reducing the effect of that. > > But isn't this what the RD problem does as well (yeah, DF shows > up as quite compile-time expensive here), and thus all UD/DU chain > users suffer from the same issue? Sure, it certainly isn't specific to the RTL-SSA code :-) I just think we can do better than my current WIP patch does. > What I didn't explore further is re-doing the way RD numbers defs > in the bitmaps with the idea that all defs just used inside a > single BB are not necessary to be represented (the local problems > take care of them). But that of course only helps if there are > a significant number of such defs (shadowed by later defs of the same > reg in the same BB) - which usually should be the case. Yeah. And I think the problem here is that we have a large number of non-local defs and uses. It doesn't look like there are an excessive number of uses per def, just that the defs are live across a large region before being used. > There's extra overhead for re-numbering things of course (but my hope > was to make the RD problem fit in the cache for a nice speedup...) Has anyone looked into how we end up in this situation for this testcase? E.g. did we make bad inlining decisions? Or is it just a natural consequence of the way the source is written? We should cope with the situation better regardless, but since extreme cases like this tend to trigger --param limits, it would be good to avoid getting into the situation too. :-) FWIW, as far as compile-time goes, the outlier in a release build seems to be do_rpo_vn.