On 6/27/19 6:45 AM, Segher Boessenkool wrote:
> On Thu, Jun 27, 2019 at 11:33:45AM +0200, Richard Biener wrote:
>> On Thu, Jun 27, 2019 at 5:23 AM Bill Schmidt <wschm...@linux.ibm.com> wrote:
>>> We've done some experimenting and realized that the subject option almost
>>> always provide improved performance for Power when the loop unroller is
>>> enabled.  So this patch turns that flag on by default for us.
>> I guess it creates more freedom for combine (more single-uses) and register
>> allocation.  I wonder in which cases this might pessimize things?  I guess
>> the pre-RA scheduler might make RAs life harder with creating overlapping
>> life-ranges.
>>
>> I guess you didn't actually investigate the nature of the improvements you 
>> saw?
> It breaks the length of dependency chains by a factor equal to the unroll
> factor.  I do not know why this doesn't help a lot everywhere.  It of
> course raises register pressure, maybe that is just it?

Right, it's all about breaking dependencies to more efficiently exploit
the microarchitecture.  By default, variable expansion in GCC is quite
conservative, creating only two reduction streams out of one, so it's
pretty rare for it to cause spill.  This can be adjusted upwards with
--param max-variable-expansions-in-unroller=n.  Our experiments show
that raising n to 4 starts to cause some minor degradations, which are
almost certainly due to pressure, so the default setting looks appropriate.
>
>> Do we want to adjust the flags documentation, saying whether this is enabled
>> by default depends on the target (or even list them)?
> Good idea, thanks.

OK, I'll update the docs and make the change that Segher requested. 
Thanks for the reviews!

Bill
>
>
> Segher

Reply via email to