On 8/18/23 17:24, Vineet Gupta wrote:
On 8/18/23 16:08, Jeff Law wrote:
There is some slight regression in code quality for a number of
vector tests where we spill more due to different instructions order.
The ones I looked at were a mix of bad luck and/or brittle tests.
Comparing the size of the generated assembly or the number of vsetvls
for SPECint also didn't show any immediate benefit but that's obviously
not a very fine-grained analysis.
Yea. In fact I wouldn't really expect significant changes other than
those key loops in x264.
Care to elaborate a bit more please. I've seen severe reg pressure /
spills in a bunch of others: cactu, lbm, exchange2. Is there something
specific to x264 spills ?
The only thing that's particularly interesting about the x264 spills is
they're caused by scheduling.
In simplest terms GCC's scheduler tries to minimize the latency of the
critical path in a block. For x264 we've got a loop that we unrolled 8
times with 8 byte sized loads per loop iteration. So 64 byte loads, all
higher from a critical path latency standpoint than anything else.
Naturally there's no way we can hold 64 values live as we only have 32
registers and thus we blow out the register file.
By turning on pressure sensitive scheduling, as register pressure
approaches the threshold, the scheduler will select a lower priority
instruction (say computing the difference of two previously loaded
values) that reduces register pressure. So it's not critical path
optimal, but it keep us from blowing out the register file and
ultimately we get better performance as a result.
jeff