On 8/18/23 17:24, Vineet Gupta wrote:


On 8/18/23 16:08, Jeff Law wrote:
There is some slight regression in code quality for a number of
vector tests where we spill more due to different instructions order.
The ones I looked at were a mix of bad luck and/or brittle tests.
Comparing the size of the generated assembly or the number of vsetvls
for SPECint also didn't show any immediate benefit but that's obviously
not a very fine-grained analysis.
Yea.  In fact I wouldn't really expect significant changes other than those key loops in x264.

Care to elaborate a bit more please. I've seen severe reg pressure / spills in a bunch of others: cactu, lbm, exchange2. Is there something specific to x264 spills ?
The only thing that's particularly interesting about the x264 spills is they're caused by scheduling.

In simplest terms GCC's scheduler tries to minimize the latency of the critical path in a block. For x264 we've got a loop that we unrolled 8 times with 8 byte sized loads per loop iteration. So 64 byte loads, all higher from a critical path latency standpoint than anything else.

Naturally there's no way we can hold 64 values live as we only have 32 registers and thus we blow out the register file.

By turning on pressure sensitive scheduling, as register pressure approaches the threshold, the scheduler will select a lower priority instruction (say computing the difference of two previously loaded values) that reduces register pressure. So it's not critical path optimal, but it keep us from blowing out the register file and ultimately we get better performance as a result.

jeff

Reply via email to