Re: [PATCH v1 03/15] tcg: Fix register allocation constraints

Richard Henderson Tue, 13 Aug 2024 20:09:43 -0700

On 8/14/24 12:27, LIU Zhiwei wrote:

On 2024/8/14 10:04, Richard Henderson wrote:
On 8/14/24 10:58, LIU Zhiwei wrote:
Thus if we want to use all registers of vectors, we have to add a dynamic constraint onregister allocation based on IR types.
My comment vs patch 4 is that you can't do that, at least not without large 
changes to TCG.
In addition, I said that the register pressure on vector regs is not high enough tojustify such changes. There is, so far, little benefit in having more than 4 or 5vector registers, much less 32. Thus 7 (lmul 4, omitting v0) is sufficient.
At least on QEMU, SVE can support 2048 bit vector length with 'sve-default-vector-length=256'. Software optimized with SVE, such as X264 can benefit with long SVE lengthin less dynamic A64 instructions.
We want to expose all host vector ability. Thus the largest TCG_TYPE_V256 is not enough,as 128-bit RVV can give 8*128=1024 width operation. We have expand TCG_TYPE_V512/1024/2048types(not in this patch set, but intend to upstream later).With large TCG_TYPE_V1024/2048, we get better performance on RISC-V board with much lesstranslated RISC-V vector instructions. We can give a more detailed experiment result ifneeded.
However, we will only have 3 vector register when support TCG_TYPE_V1024. And even lessfor TCG_TYPE_V2048. Current approach will give more vectors TCG_TYPE_V128 even withsupport TCG_TYPE_V1024, which will relax some guest NEON register pressure.

Then you will have to teach TCG about one operand consuming and clobbering N hardregisters, so that you get the spills and fills done correctly.


But you haven't done that in this patch set, so will currently generate 
incorrect code.

I think you should make longer vector operations a longer term project, and start withsomething simpler.

r~

Re: [PATCH v1 03/15] tcg: Fix register allocation constraints

Reply via email to