On 10/30/24 7:06 PM, Jeff Law wrote:
So this patch is a very conservative approach to eliminate more vsetvl
instructions.
As we know, scheduling can scramble the instruction stream based on a
variety of factors and can easily result in an instruction sequence
where we ping-pong between different vector configurations, thus
resulting in more vsetvl instructions than if we had issued instructions
in a slightly different order.
Robin did some experiments with vsetvl aware scheduling several weeks
ago. I believe he was adjusting the priority or cost of the insns based
on their vsetvl needs. This experiment actually showed worse
performance on our design, which is a good indicator that scheduling for
latency and functional hazards is more important than eliminating vsetvl
instructions (not a surprise obviously).
That experiment greatly influenced this implementation. Specifically we
don't adjust cost/priority of any insns. Instead we use vector
configuration information to potentially swap two insns in the
scheduler's ready queue iff swapping would result in avoiding a vsetvl
and the two insns being swapped in the ready queue have the same
priority/cost.
So it's quite conservative, essentially using vector configuration as a
sort key of last resort and only for the highest priority insns in the
ready queue and only when it's going to let us eliminate a vsetvl.
For something like SATD we eliminate a few gratuitous vsetvl
instructions. As expected it makes no performance difference in the
pico benchmarks we've built on our design. The main goal here is to
side step questions about the over-active changing of vector
configurations.
My BPI is busy, so I don't have performance data on that design, but it
does result in a ~1% drop in dynamic instructions. On a high
performance design I wouldn't expect this to help performance in an
significant way, but it does make the code look better.
Bootstrapped and regression tested on riscv64-linux-gnu, also regression
tested in my tester for rv32 and rv64 elf configurations.
I'll wait for the pre-commit tester to render a verdict *and* for others
who know the vsetvl code to have time to chime in. Assuming clean from
pre-commit and no negative comments the plan would be to commit next
week sometime.
So I'm putting this on hold. I'm seeing unexpected performance
regressions when I ran this on the Ventana design. Not really sure
what's going on and given this was supposed to be neutral, something's
clearly not right. That would tend to indicate something is goofy in
the priorities.
The other possibility is the swap step. It would have been better to
slide the entries into the hold vacated by the insn we want to move to
the head of the list -- that permutes the schedule a bit less. But if
that's making a performance difference, again that would be a sign that
priorities are goofy.
Either way, I'm putting this on ice for now. Hopefully I'll come back
to it (or someone else can take a looksie) in the not terribly distant
future.
jeff