On 2024/8/14 10:04, Richard Henderson wrote:
On 8/14/24 10:58, LIU Zhiwei wrote:
Thus if we want to use all registers of vectors, we have to add a
dynamic constraint on register allocation based on IR types.
My comment vs patch 4 is that you can't do that, at least not without
large changes to TCG.
In addition, I said that the register pressure on vector regs is not
high enough to justify such changes. There is, so far, little benefit
in having more than 4 or 5 vector registers, much less 32. Thus 7
(lmul 4, omitting v0) is sufficient.
At least on QEMU, SVE can support 2048 bit vector length with
'sve-default-vector-length=256'. Software optimized with SVE, such as
X264 can benefit with long SVE length in less dynamic A64 instructions.
We want to expose all host vector ability. Thus the largest
TCG_TYPE_V256 is not enough, as 128-bit RVV can give 8*128=1024 width
operation. We have expand TCG_TYPE_V512/1024/2048 types(not in this
patch set, but intend to upstream later).
With large TCG_TYPE_V1024/2048, we get better performance on RISC-V
board with much less translated RISC-V vector instructions. We can give
a more detailed experiment result if needed.
However, we will only have 3 vector register when support
TCG_TYPE_V1024. And even less for TCG_TYPE_V2048. Current approach
will give more vectors TCG_TYPE_V128 even with support TCG_TYPE_V1024,
which will relax some guest NEON register pressure.
Thanks,
Zhiwei
r~