On 10/30/24 7:06 PM, Jeff Law wrote:
So this patch is a very conservative approach to eliminate more vsetvl instructions.

As we know, scheduling can scramble the instruction stream based on a variety of factors and can easily result in an instruction sequence where we ping-pong between different vector configurations, thus resulting in more vsetvl instructions than if we had issued instructions in a slightly different order.

Robin did some experiments with vsetvl aware scheduling several weeks ago.  I believe he was adjusting the priority or cost of the insns based on their vsetvl needs.  This experiment actually showed worse performance on our design, which is a good indicator that scheduling for latency and functional hazards is more important than eliminating vsetvl instructions (not a surprise obviously).

That experiment greatly influenced this implementation.  Specifically we don't adjust cost/priority of any insns.  Instead we use vector configuration information to potentially swap two insns in the scheduler's ready queue iff swapping would result in avoiding a vsetvl and the two insns being swapped in the ready queue have the same priority/cost.

So it's quite conservative, essentially using vector configuration as a sort key of last resort and only for the highest priority insns in the ready queue and only when it's going to let us eliminate a vsetvl.

For something like SATD we eliminate a few gratuitous vsetvl instructions.  As expected it makes no performance difference in the pico benchmarks we've built on our design.  The main goal here is to side step questions about the over-active changing of vector configurations.

My BPI is busy, so I don't have performance data on that design, but it does result in a ~1% drop in dynamic instructions.  On a high performance design I wouldn't expect this to help performance in an significant way, but it does make the code look better.

Bootstrapped and regression tested on riscv64-linux-gnu, also regression tested in my tester for rv32 and rv64 elf configurations.


I'll wait for the pre-commit tester to render a verdict *and* for others who know the vsetvl code to have time to chime in.  Assuming clean from pre-commit and no negative comments the plan would be to commit next week sometime.
So I'm putting this on hold. I'm seeing unexpected performance regressions when I ran this on the Ventana design. Not really sure what's going on and given this was supposed to be neutral, something's clearly not right. That would tend to indicate something is goofy in the priorities.

The other possibility is the swap step. It would have been better to slide the entries into the hold vacated by the insn we want to move to the head of the list -- that permutes the schedule a bit less. But if that's making a performance difference, again that would be a sign that priorities are goofy.

Either way, I'm putting this on ice for now. Hopefully I'll come back to it (or someone else can take a looksie) in the not terribly distant future.

jeff

Reply via email to