> +static bool vector_lmul_check_reg(CPURISCVState *env, uint32_t lmul, > + uint32_t reg, bool widen) > +{ > + int legal = widen ? (lmul * 2) : lmul; > + > + if ((lmul != 1 && lmul != 2 && lmul != 4 && lmul != 8) || > + (lmul == 8 && widen)) { > + helper_raise_exception(env, RISCV_EXCP_ILLEGAL_INST); > + return false; > + } > + > + if (reg % legal != 0) { > + helper_raise_exception(env, RISCV_EXCP_ILLEGAL_INST); > + return false; > + } > + return true; > +}
These exceptions will not do the right thing. You cannot call helper_raise_exception from another helper, or from something called from another helper, as here. You need to use riscv_raise_exception, as you do elsewhere in this patch, with a GETPC() value passed down from the outermost helper. Ideally you would check these conditions at translate time. I've mentioned how to do this in reply to your v1. > +void VECTOR_HELPER(vlbu_v)(CPURISCVState *env, uint32_t nf, uint32_t vm,> + > uint32_t rs1, uint32_t rd) You should pass the rs1 register by value, not by index. > +{> + int i, j, k, vl, vlmax, lmul, width, dest, read;> +> + vl = env->vfp.vl;> +> + lmul = vector_get_lmul(env);> + width = vector_get_width(env);> + vlmax = vector_get_vlmax(env);> +> + if (vector_vtype_ill(env) || vector_overlap_vm_common(lmul, vm, rd)) {> + riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC());> + return;> + }> + if (lmul * (nf + 1) > 32) {> + riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC());> + return;> + } Again, these exceptions should ideally be identified at translate time. I also think that you should have at least two different helpers: one that checks the vector mask and one that doesn't. If you check the above conditions at translate time then you'll also want to split the helpers based on element width. You could also meaningfully split nf == 0 vs nf != 0. You will, in any case, need to check at translate time whether the Zvlsseg extension is enabled before allowing nf != 0. > + > + vector_lmul_check_reg(env, lmul, rd, false); > + > + for (i = 0; i < vlmax; i++) { > + dest = rd + (i / (VLEN / width)); > + j = i % (VLEN / width); This division is exactly why I suggested making vreg[] one contiguous array of elements instead of a two-dimensional array. I think the distinction of 32 VLEN-sized registers should be reserved for cpu dumps and gdbstub. > + k = nf; > + if (i < env->vfp.vstart) { > + continue; Surely you should hoist this check outside the loop. > + } else if (i < vl) { > + switch (width) { > + case 8: > + if (vector_elem_mask(env, vm, width, lmul, i)) { > + while (k >= 0) { > + read = i * (nf + 1) + k; > + env->vfp.vreg[dest + k * lmul].u8[j] = > + cpu_ldub_data(env, env->gpr[rs1] + read); You must not modify vreg[x] before you've recognized all possible exceptions, e.g. validating that a subsequent access will not trigger a page fault. Otherwise you will have a partially modified register value when the exception handler is entered. Without a stride, and without a predicate mask, this can be done with at most two calls to probe_access (one per page). This is the simplification that makes splitting the helper into two very helpful. With a stride or with a predicate mask requires either (1) temporary storage for the loads, and copy back to env at the end, or (2) use probe_access for each load, and then perform the actual loads directly into env. FWIW, ARM SVE uses (1), as probe_access is very new. > + k--; > + } > + env->vfp.vstart++; > + } > + break; > + case 16: > + if (vector_elem_mask(env, vm, width, lmul, i)) { > + while (k >= 0) { > + read = i * (nf + 1) + k; > + env->vfp.vreg[dest + k * lmul].u16[j] = > + cpu_ldub_data(env, env->gpr[rs1] + read); I don't see anything in these assignments to vreg[x].uN[y] that take the endianness of the host into account. You need to think about how the architecture defines the overlap of elements -- particularly across vlset -- and make adjustments. I can imagine, if you have explicit tests for this, your tests are passing because the architecture defines a little-endian based indexing of the register file, and you have only run tests on a little-endian host, like x86_64. For ARM, we define the representation as a little-endian indexed array of host-endian uint64_t. This means that a big-endian host needs to adjust the address of any element smaller than 64-bit. E.g. #ifdef HOST_WORDS_BIGENDIAN #define H1(x) ((x) ^ 7) #define H2(x) ((x) ^ 3) #define H4(x) ((x) ^ 1) #else #define H1(x) (x) #define H2(x) (x) #define H4(x) (x) #endif env->vfp.vreg[reg + k * lmul].u16[H2(j)] > + case 64: > + if (vector_elem_mask(env, vm, width, lmul, i)) { > + while (k >= 0) { > + read = i * (nf + 1) + k; > + env->vfp.vreg[dest + k * lmul].u64[j] = > + cpu_ldub_data(env, env->gpr[rs1] + read); > + k--; > + } > + env->vfp.vstart++; > + } > + break; > + default: > + riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC()); Ideally, this condition is detected at translate time. You must detect this condition before making any changes to cpu state. Moreover, the SIGILL should not be skipped because of VSTART. > +static target_ulong vector_get_index(CPURISCVState *env, int rs1, int rs2, > + int index, int mem, int width, int nf) > +{ > + target_ulong abs_off, base = env->gpr[rs1]; You should be passing rs1 by value, not by index. > + target_long offset; > + switch (width) { > + case 8: > + offset = sign_extend(env->vfp.vreg[rs2].s8[index], 8) + nf * mem; > + break; > + case 16: > + offset = sign_extend(env->vfp.vreg[rs2].s16[index], 16) + nf * mem; > + break; > + case 32: > + offset = sign_extend(env->vfp.vreg[rs2].s32[index], 32) + nf * mem; > + break; > + case 64: > + offset = env->vfp.vreg[rs2].s64[index] + nf * mem; > + break; > + default: > + helper_raise_exception(env, RISCV_EXCP_ILLEGAL_INST); > + return 0; > + } > + if (offset < 0) { > + abs_off = ~offset + 1; You have been hanging around hardware people too much. In software we normally write this "-offset". ;-) > + if (base >= abs_off) { > + return base - abs_off; > + } > + } else { > + if ((target_ulong)((target_ulong)offset + base) >= base) { > + return (target_ulong)offset + base; > + } > + } Why all the extra casting here? They are exactly what is implied by C. > + helper_raise_exception(env, RISCV_EXCP_ILLEGAL_INST); > + return 0; (1) This exception call won't work, as above, (2) Where does this condition against wraparound come from? I don't see it in the specification. (3) You certainly cannot detect this after having written a previous element to the register file. [ Skipping lots of functions that are basically the same. ] > +void VECTOR_HELPER(vsxe_v)(CPURISCVState *env, uint32_t nf, uint32_t vm, > + uint32_t rs1, uint32_t rs2, uint32_t rd) Pass rs1 by value. > + case 8: > + if (vector_elem_mask(env, vm, width, lmul, i)) { > + while (k >= 0) { > + addr = vector_get_index(env, rs1, src2, j, 1, width, > k); > + cpu_stb_data(env, addr, > + env->vfp.vreg[dest + k * lmul].s8[j]); Must probe_access all of the memory before any stores. Unlike loads, you don't have the option of storing into a temporary. Which suggests a common subroutine to perform the probe(s), rather than bother with a temporary for loads. > +void VECTOR_HELPER(vsuxe_v)(CPURISCVState *env, uint32_t nf, uint32_t vm, > + uint32_t rs1, uint32_t rs2, uint32_t rd) > +{ > + return VECTOR_HELPER(vsxe_v)(env, nf, vm, rs1, rs2, rd); You can't do this and expect the GETPC() for the exceptions raised by vsxe_v to operate properly. You must define a common helper function and pass in GETPC(), or preferably not have this second helper function at all. There's no reason why you cannot call vsxe_v for implementing vsuxe_v. It's merely laziness within the macros you set up in trans_rvv.inc.c. > + env->vfp.vstart = 0; > +} Dead code after the return, in any case. r~