On Tue, 13 May 2025, Tamar Christina wrote: > > -----Original Message----- > > From: Richard Biener <rguent...@suse.de> > > Sent: Tuesday, May 13, 2025 12:44 PM > > To: Eric Botcazou <botca...@adacore.com> > > Cc: Tamar Christina <tamar.christ...@arm.com>; gcc-patches@gcc.gnu.org; nd > > <n...@arm.com> > > Subject: Re: [PATCH 1/4]middle-end: document pragma unroll n > > <requested|preferred> [PR116140] > > > > On Tue, 13 May 2025, Eric Botcazou wrote: > > > > > > In PR116140 it was brought up that adding pragma GCC unroll in std::find > > > > makes it so that you can't use a larger unroll factor if you wanted to. > > > > This is because the value can't be overriden by the other unrolling > > > > flags > > > > such as -funroll-loops. > > > > > > What about letting -funroll-loops either augment or use a multiple of the > > > specified factor? > > > > I'm adding my general comment here. While I think it's reasonable > > to honor a #pramga unroll during vectorization by trying to adjust > > the vectorization factor to the suggested unroll factor, adjusting > > the "remaining" (forced) unroll is probably not always desired, > > expected or good. > > I guess you're referring to the other patch (That's a separate change that I > think should be debated there because whatever the vectorizer does is > independent of the scalar unroller). I can't think of a case where > not adjusting the remaining forced unrolling is a desirable thing? > > In my opinion the pragma is referring to unrolling of the scalar code, not > vector. And if the vectorizer has already unrolled the loop, doing additional > unrolling of the vector code is always going to be slow. > > The larger the unroll factor the more preheader statement GCC generates. > If you have e.g. pragma unroll 16 on a SI loop, the vectorizer already > unrolles > 4 V4SI, for the rtl unroller to then unroll this loop 16 times more, means you > have VF requirements of 4x V4SI to 64x V4SI for each loop entry. Surely the > user > could not have meant that.
I agree - as said, it's a matter of documenting the interaction between loop optimizations that unroll and #pragma unroll. Btw, it probably makes sense to record loop transforms in struct loop so followup transforms can alter their heuristics accordingly. As I mentioned there are more passes that perform loop unrolling as part of their transform. > > > > In absence of #pragma unroll the loop unroller has heuristics that > > might want to incorporate whether a loop was already unrolled > > from original scalar, but the heuristics should work independent > > of that. This is especially true in the context of complete > > unrolling in cunroll, not so much about the RTL unroller which > > lacks any good heuristics. > > > > This isn't true, as it has a target hook for costing. Some targets > already have some heuristics to unroll small loops, and I'm planning on > doing the same for AArch64 based on the throughput of the loop. > > > The current #pragma unroll is a force thing originally invented > > to guide the RTL unroller when it is disabled (as it is by default). > > That it is effectively a "force exact value" is a side-effect of > > the lack of any different behavior there (before the #pramga it > > would unroll by 8, always). > > > > IMO there's not enough reason to complicate the tunable, much > > less by "weak" attributes like requested vs. preferred. I'd > > rather allow > > > > #pragma GCC unroll > > > > without a specific unroll factor to suggest GCC should enable > > unrolling for this loop, but according to heuristics, rather > > than to a fixed amount (that would be your "preferred" I guess). > > The reason for the extra keyword is to *still* get the requested unrolling > when -funroll-loops is not specified. > > With your suggestion the user could never specify a default unroll factor > for a loop for when `-funroll-loops` is not used. > > i.e. > > #pragma GCC unroll > And > #pragma GCC unroll 4 preferred > > Are not the same without -funroll-loops and that's the difference this change > is trying to realize. Sure, but 'preferred' is what we have now when not using any keyword? And with suggested you simply drop the number. The user currently also cannot specify an alternate default unroll to be used for the case -funroll-loops is specified without forcing unrolling when it is not. That is, there is no fully fine-grained control available. And IMO that's OK - such fine-grained control usually leads to bad performance on most targets. Richard. > Thanks, > Tamar > > > > > Richard. > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)