https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98598
--- Comment #8 from rguenther at suse dot de <rguenther at suse dot de> --- On Sat, 9 Jan 2021, jiangning.liu at amperecomputing dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98598 > > --- Comment #7 from Jiangning Liu <jiangning.liu at amperecomputing dot com> > --- > (In reply to rguent...@suse.de from comment #6) > > On January 9, 2021 4:17:17 AM GMT+01:00, "jiangning.liu at amperecomputing > > dot com" <gcc-bugzi...@gcc.gnu.org> wrote: > > >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98598 > > > > > >--- Comment #5 from Jiangning Liu <jiangning.liu at amperecomputing dot > > >com> --- > > >> It has to be done with care of course, cost modeling is difficult > > >> (we need to have a good estimate of n and m or need to version > > >> the whole nest). That said, usually we attempt the reverse > > >transform. > > > > > >Before tuning the cost model good enough, we may implement this > > >optimization by > > >adding a new optimization command line option. This won't hurt gcc, > > >right? > > > > New options not enabled by default tend to bitrot, be broken from the start > > and won't be used by the lazy user. So I see no point in doing that. > > > > Understand. I mean we can enable it by default eventually, but we need to > implement and tune it step by step. It is unrealistic to work out the best > cost > model at the very beginning. Sure. The "easiest" thing is to rely on a profile from PGO, we did have some transforms only enabled by -fprofile-use by default. That is, the cost model needs to be conservative, esp. if you introduce dynamic allocation for this. In the end I guess only a variant that versions the nest on the size of the temporary will be good enough to not trigger OOM or excessive overhead for small sizes anyway.