On 2024/9/5 12:05, Richard Henderson wrote:
On 9/4/24 07:27, LIU Zhiwei wrote:
From: Swung0x48 <swung0...@outlook.com>

The RISC-V vector instruction set utilizes the LMUL field to group
multiple registers, enabling variable-length vector registers. This
implementation uses only the first register number of each group while
reserving the other register numbers within the group.

In TCG, each VEC_IR can have 3 types (TCG_TYPE_V64/128/256), and the
host runtime needs to adjust LMUL based on the type to use different
register groups.

This presents challenges for TCG's register allocation. Currently, we
avoid modifying the register allocation part of TCG and only expose the
minimum number of vector registers.

For example, when the host vlen is 64 bits and type is TCG_TYPE_V256, with
LMUL equal to 4, we use 4 vector registers as one register group. We can
use a maximum of 8 register groups, but the V0 register number is reserved
as a mask register, so we can effectively use at most 7 register groups.
Moreover, when type is smaller than TCG_TYPE_V256, only 7 registers are
forced to be used. This is because TCG cannot yet dynamically constrain
registers with type; likewise, when the host vlen is 128 bits and
TCG_TYPE_V256, we can use at most 15 registers.

There is not much pressure on vector register allocation in TCG now, so
using 7 registers is feasible and will not have a major impact on code
generation.

This patch:
1. Reserves vector register 0 for use as a mask register.
2. When using register groups, reserves the additional registers within
    each group.

Signed-off-by: TANG Tiancheng <tangtiancheng....@alibaba-inc.com>
Co-authored-by: TANG Tiancheng <tangtiancheng....@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_...@linux.alibaba.com>

This patch does not compile.

../src/tcg/tcg.c:135:13: error: 'tcg_out_dup_vec' used but never defined [-Werror]   135 | static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
      |             ^~~~~~~~~~~~~~~
../src/tcg/tcg.c:137:13: error: 'tcg_out_dupm_vec' used but never defined [-Werror]   137 | static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
      |             ^~~~~~~~~~~~~~~~
../src/tcg/tcg.c:139:13: error: 'tcg_out_dupi_vec' used but never defined [-Werror]   139 | static void tcg_out_dupi_vec(TCGContext *s, TCGType type, unsigned vece,
      |             ^~~~~~~~~~~~~~~~
In file included from ../src/tcg/tcg.c:755:
/home/rth/qemu/src/tcg/riscv/tcg-target.c.inc:516:13: error: 'tcg_out_opc_ldst_vec' defined but not used [-Werror=unused-function]   516 | static void tcg_out_opc_ldst_vec(TCGContext *s, RISCVInsn opc, TCGReg data,
      |             ^~~~~~~~~~~~~~~~~~~~
/home/rth/qemu/src/tcg/riscv/tcg-target.c.inc:507:13: error: 'tcg_out_opc_vi' defined but not used [-Werror=unused-function]   507 | static void tcg_out_opc_vi(TCGContext *s, RISCVInsn opc, TCGReg vd,
      |             ^~~~~~~~~~~~~~
/home/rth/qemu/src/tcg/riscv/tcg-target.c.inc:501:13: error: 'tcg_out_opc_vx' defined but not used [-Werror=unused-function]   501 | static void tcg_out_opc_vx(TCGContext *s, RISCVInsn opc, TCGReg vd,
      |             ^~~~~~~~~~~~~~
/home/rth/qemu/src/tcg/riscv/tcg-target.c.inc:495:13: error: 'tcg_out_opc_vv' defined but not used [-Werror=unused-function]   495 | static void tcg_out_opc_vv(TCGContext *s, RISCVInsn opc, TCGReg vd,
      |             ^~~~~~~~~~~~~~
cc1: all warnings being treated as errors
Oops. We miss compiling each patch one by one.

Either:
(1) Provide stubs for the functions that are required, and delay implementation
    of the unused functions until the patch(es) that use them.
We will take this way.
(2) Merge the dup patch so that these functions are defined and implemented,
    which will also provide uses for most of the tcg_out_opc_* functions.


@@ -2100,6 +2174,32 @@ static void tcg_target_init(TCGContext *s)
  {
      tcg_target_available_regs[TCG_TYPE_I32] = 0xffffffff;
      tcg_target_available_regs[TCG_TYPE_I64] = 0xffffffff;
+    s->reserved_regs = 0;
+
+    if (cpuinfo & CPUINFO_ZVE64X) {
+        switch (riscv_vlen) {
+        case 64:
+            tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS;
+            tcg_target_available_regs[TCG_TYPE_V128] = ALL_DVECTOR_REG_GROUPS; +            tcg_target_available_regs[TCG_TYPE_V256] = ALL_QVECTOR_REG_GROUPS; +            s->reserved_regs |= (~ALL_QVECTOR_REG_GROUPS & 0xffffffff00000000);

No need for ().
Use ALL_VECTOR_REGS instead of the immediate integer.
OK.

+            break;
+        case 128:
+            tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS;
+            tcg_target_available_regs[TCG_TYPE_V128] = ALL_VECTOR_REGS;
+            tcg_target_available_regs[TCG_TYPE_V256] = ALL_DVECTOR_REG_GROUPS; +            s->reserved_regs |= (~ALL_DVECTOR_REG_GROUPS & 0xffffffff00000000);
+            break;
+        case 256:
+            tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS;
+            tcg_target_available_regs[TCG_TYPE_V128] = ALL_VECTOR_REGS;
+            tcg_target_available_regs[TCG_TYPE_V256] = ALL_VECTOR_REGS;
+            break;
+        default:
+            g_assert_not_reached();

The first host with 512-bit or larger vectors will trigger the assert.

With my suggestion against patch 2, this becomes

    switch (riscv_lg2_vlenb) {
    case TCG_TYPE_V64:
        ...
    case TCG_TYPE_V128:
        ...
    default:
        /* Guaranteed by Zve64x. */
        tcg_debug_assert(riscv_lg2_vlenb >= TCG_TYPE_V256);
    }

Agree.


Thanks,

Zhiwei

r~

Reply via email to