On 11/8/18 5:10 AM, Kyrill Tkachov wrote:
> <Sending this on behalf of Richard Sandiford>
> 
> This patch fixes a flaw in the relationship between the way that
> SCHED_PRESSURE_MODEL calculates the alap and depth vs how it uses
> them in model_order_p.  A comment in model_order_p says:
> 
>   /* Combine the length of the longest path of satisfied true dependencies
>      that leads to each instruction (depth) with the length of the longest
>      path of any dependencies that leads from the instruction (alap).
>      Prefer instructions with the greatest combined length.  If the
> combined
>      lengths are equal, prefer instructions with the greatest depth.
> 
>      The idea is that, if we have a set S of "equal" instructions that each
>      have ALAP value X, and we pick one such instruction I, any
> true-dependent
>      successors of I that have ALAP value X - 1 should be preferred over S.
>      This encourages the schedule to be "narrow" rather than "wide".
>      However, if I is a low-priority instruction that we decided to
>      schedule because of its model_classify_pressure, and if there
>      is a set of higher-priority instructions T, the aforementioned
>      successors of I should not have the edge over T.  */
> 
> The expectation was that scheduling an instruction X would give a
> greater priority to the highest-priority successor instructions Y than
> X had: Y.depth would be X.depth + 1 and Y.alap would be X.alap - 1,
> giving an equal combined height, but with the greater depth winning as
> a tie-breaker. But this doesn't work if the alap value was ultimately
> determined by an anti-dependence.
> 
> This is particularly bad when --param max-pending-list-length kicks in,
> since we then start adding fake anti-dependencies in order to keep the
> list length down.  These fake dependencies tend to be on the critical
> path.
> 
> The attached patch avoids that by making the alap calculation only
> look at true dependencies.  This shouldn't be too bad, since we use
> INSN_PRIORITY as the final tie-breaker than that does take
> anti-dependencies into account.
> 
> This reduces the number of spills in the hot function from 436.cactusADM
> by 14% on aarch64 at -O3 (and the number of instructions in general).
> SPEC2017 shows a minor improvement on Cortex-A72 (about 0.1% overall).
> Thanks to Wilco for the benchmarking.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> 
> Is this ok for trunk?
> 
> Thanks,
> Kyrill
> 
> 2018-11-08  Richard Sandiford  <richard.sandif...@arm.com>
> 
> gcc/
>     * haifa-sched.c (model_analyze_insns): Only add 1 to the consumer's
>     ALAP if the dependence is a true dependence.
So at the least the documentation of the ALAP field would need to be
updated as well as the comment you referenced (the "any dependencies").

But more importantly, it seems like blindly ignoring anti dependencies
is just a hack that happens to work.  I wonder if we could somehow mark
the fake dependencies we add, and avoid bumping the ALAP when we
encounter those fake dependencies.

It probably wouldn't be a bad idea to look at the default for
MAX_PENDING_LIST_LENGTH.  Based on the current default value and the
comments in the code that value could well have been tuned 25 or more
years ago!  How often are we falling over that during a bootstrap and
during spec builds?


Jeff

Reply via email to