Re: [PATCH v1 03/15] tcg: Fix register allocation constraints

LIU Zhiwei Wed, 14 Aug 2024 00:48:44 -0700


On 2024/8/14 12:18, Richard Henderson wrote:

On 8/14/24 13:30, LIU Zhiwei wrote:
On 2024/8/14 11:08, Richard Henderson wrote:
On 8/14/24 12:27, LIU Zhiwei wrote:
On 2024/8/14 10:04, Richard Henderson wrote:
On 8/14/24 10:58, LIU Zhiwei wrote:
Thus if we want to use all registers of vectors, we have to add adynamic constraint on register allocation based on IR types.
My comment vs patch 4 is that you can't do that, at least notwithout large changes to TCG.
In addition, I said that the register pressure on vector regs isnot high enough to justify such changes. There is, so far, littlebenefit in having more than 4 or 5 vector registers, much less 32.Thus 7 (lmul 4, omitting v0) is sufficient.
At least on QEMU, SVE can support 2048 bit vector length with'sve-default-vector- length=256'. Software optimized with SVE,such as X264 can benefit with long SVE length in less dynamic A64instructions.
We want to expose all host vector ability. Thus the largestTCG_TYPE_V256 is not enough, as 128-bit RVV can give 8*128=1024width operation. We have expand TCG_TYPE_V512/1024/2048 types(notin this patch set, but intend to upstream later).With large TCG_TYPE_V1024/2048, we get better performance on RISC-Vboard with much less translated RISC-V vector instructions. We cangive a more detailed experiment result if needed.
However, we will only have 3 vector register when supportTCG_TYPE_V1024. And even less for TCG_TYPE_V2048. Currentapproach will give more vectors TCG_TYPE_V128 even with supportTCG_TYPE_V1024, which will relax some guest NEON register pressure.
Then you will have to teach TCG about one operand consuming andclobbering N hard registers, so that you get the spills and fillsdone correctly.
I think we have done this in patch 6.
No, you have not.
There are no modifications to tcg_reg_alloc, and there are noadditional calls to tcg_reg_free, which is where spills are generated.There would also need to be changes on the fill side, temp_load.

Thanks. I choose the simple design as you suggest for this patch set.And We will fix this problem when send the longer vector operationsimplementation.

I think you should make longer vector operations a longer term project,
Does longer vector operations implementation deserves to upstream? Wecan contribute it sooner as it is ready.
Sure.


Good news!

Thanks,
Zhiwei

r~

Re: [PATCH v1 03/15] tcg: Fix register allocation constraints

Reply via email to