Re: [Qemu-devel] [PATCH v2 05/17] RISC-V: add vector extension load and store instructions

Richard Henderson Thu, 12 Sep 2019 07:27:54 -0700

> +static bool  vector_lmul_check_reg(CPURISCVState *env, uint32_t lmul,
> +        uint32_t reg, bool widen)
> +{
> +    int legal = widen ? (lmul * 2) : lmul;
> +
> +    if ((lmul != 1 && lmul != 2 && lmul != 4 && lmul != 8) ||
> +        (lmul == 8 && widen)) {
> +        helper_raise_exception(env, RISCV_EXCP_ILLEGAL_INST);
> +        return false;
> +    }
> +
> +    if (reg % legal != 0) {
> +        helper_raise_exception(env, RISCV_EXCP_ILLEGAL_INST);
> +        return false;
> +    }
> +    return true;
> +}


These exceptions will not do the right thing.

You cannot call helper_raise_exception from another helper, or from something
called from another helper, as here.  You need to use riscv_raise_exception, as
you do elsewhere in this patch, with a GETPC() value passed down from the
outermost helper.

Ideally you would check these conditions at translate time.
I've mentioned how to do this in reply to your v1.


> +void VECTOR_HELPER(vlbu_v)(CPURISCVState *env, uint32_t nf, uint32_t vm,> +  
>   uint32_t rs1, uint32_t rd)

You should pass the rs1 register by value, not by index.

> +{> +    int i, j, k, vl, vlmax, lmul, width, dest, read;> +> +    vl =
env->vfp.vl;> +> +    lmul   = vector_get_lmul(env);> +    width =
vector_get_width(env);> +    vlmax = vector_get_vlmax(env);> +> +    if
(vector_vtype_ill(env) || vector_overlap_vm_common(lmul, vm, rd)) {> +
riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC());> +
return;> +    }> +    if (lmul * (nf + 1) > 32) {> +
riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC());> +
return;> +    }
Again, these exceptions should ideally be identified at translate time.

I also think that you should have at least two different helpers: one that
checks the vector mask and one that doesn't.  If you check the above conditions
at translate time then you'll also want to split the helpers based on element
width.

You could also meaningfully split nf == 0 vs nf != 0.  You will, in any case,
need to check at translate time whether the Zvlsseg extension is enabled before
allowing nf != 0.


> +
> +    vector_lmul_check_reg(env, lmul, rd, false);
> +
> +    for (i = 0; i < vlmax; i++) {
> +        dest = rd + (i / (VLEN / width));
> +        j = i % (VLEN / width);

This division is exactly why I suggested making vreg[] one contiguous array of
elements instead of a two-dimensional array.  I think the distinction of 32
VLEN-sized registers should be reserved for cpu dumps and gdbstub.


> +        k = nf;
> +        if (i < env->vfp.vstart) {
> +            continue;

Surely you should hoist this check outside the loop.

> +        } else if (i < vl) {
> +            switch (width) {
> +            case 8:
> +                if (vector_elem_mask(env, vm, width, lmul, i)) {
> +                    while (k >= 0) {
> +                        read = i * (nf + 1)  + k;
> +                        env->vfp.vreg[dest + k * lmul].u8[j] =
> +                            cpu_ldub_data(env, env->gpr[rs1] + read);

You must not modify vreg[x] before you've recognized all possible exceptions,
e.g. validating that a subsequent access will not trigger a page fault.
Otherwise you will have a partially modified register value when the exception
handler is entered.

Without a stride, and without a predicate mask, this can be done with at most
two calls to probe_access (one per page).  This is the simplification that
makes splitting the helper into two very helpful.

With a stride or with a predicate mask requires either
(1) temporary storage for the loads, and copy back to env at the end, or
(2) use probe_access for each load, and then perform the actual loads directly
into env.

FWIW, ARM SVE uses (1), as probe_access is very new.


> +                        k--;
> +                    }
> +                    env->vfp.vstart++;
> +                }
> +                break;
> +            case 16:
> +                if (vector_elem_mask(env, vm, width, lmul, i)) {
> +                    while (k >= 0) {
> +                        read = i * (nf + 1)  + k;
> +                        env->vfp.vreg[dest + k * lmul].u16[j] =
> +                            cpu_ldub_data(env, env->gpr[rs1] + read);

I don't see anything in these assignments to vreg[x].uN[y] that take the
endianness of the host into account.

You need to think about how the architecture defines the overlap of elements --
particularly across vlset -- and make adjustments.

I can imagine, if you have explicit tests for this, your tests are passing
because the architecture defines a little-endian based indexing of the register
file, and you have only run tests on a little-endian host, like x86_64.

For ARM, we define the representation as a little-endian indexed array of
host-endian uint64_t.  This means that a big-endian host needs to adjust the
address of any element smaller than 64-bit.  E.g.

#ifdef HOST_WORDS_BIGENDIAN
#define H1(x)   ((x) ^ 7)
#define H2(x)   ((x) ^ 3)
#define H4(x)   ((x) ^ 1)
#else
#define H1(x)   (x)
#define H2(x)   (x)
#define H4(x)   (x)
#endif

    env->vfp.vreg[reg + k * lmul].u16[H2(j)]


> +            case 64:
> +                if (vector_elem_mask(env, vm, width, lmul, i)) {
> +                    while (k >= 0) {
> +                        read = i * (nf + 1)  + k;
> +                        env->vfp.vreg[dest + k * lmul].u64[j] =
> +                            cpu_ldub_data(env, env->gpr[rs1] + read);
> +                        k--;
> +                    }
> +                    env->vfp.vstart++;
> +                }
> +                break;
> +            default:
> +                riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC());

Ideally, this condition is detected at translate time.
You must detect this condition before making any changes to cpu state.
Moreover, the SIGILL should not be skipped because of VSTART.


> +static target_ulong vector_get_index(CPURISCVState *env, int rs1, int rs2,
> +    int index, int mem, int width, int nf)
> +{
> +    target_ulong abs_off, base = env->gpr[rs1];

You should be passing rs1 by value, not by index.

> +    target_long offset;
> +    switch (width) {
> +    case 8:
> +        offset = sign_extend(env->vfp.vreg[rs2].s8[index], 8) + nf * mem;
> +        break;
> +    case 16:
> +        offset = sign_extend(env->vfp.vreg[rs2].s16[index], 16) + nf * mem;
> +        break;
> +    case 32:
> +        offset = sign_extend(env->vfp.vreg[rs2].s32[index], 32) + nf * mem;
> +        break;
> +    case 64:
> +        offset = env->vfp.vreg[rs2].s64[index] + nf * mem;
> +        break;
> +    default:
> +        helper_raise_exception(env, RISCV_EXCP_ILLEGAL_INST);
> +        return 0;
> +    }
> +    if (offset < 0) {
> +        abs_off = ~offset + 1;

You have been hanging around hardware people too much.
In software we normally write this "-offset".  ;-)

> +        if (base >= abs_off) {
> +            return base - abs_off;
> +        }
> +    } else {
> +        if ((target_ulong)((target_ulong)offset + base) >= base) {
> +            return (target_ulong)offset + base;
> +        }
> +    }

Why all the extra casting here?  They are exactly what is implied by C.

> +    helper_raise_exception(env, RISCV_EXCP_ILLEGAL_INST);
> +    return 0;

(1) This exception call won't work, as above,
(2) Where does this condition against wraparound come from?
    I don't see it in the specification.
(3) You certainly cannot detect this after having written a
    previous element to the register file.

[ Skipping lots of functions that are basically the same. ]

> +void VECTOR_HELPER(vsxe_v)(CPURISCVState *env, uint32_t nf, uint32_t vm,
> +    uint32_t rs1, uint32_t rs2, uint32_t rd)

Pass rs1 by value.

> +            case 8:
> +                if (vector_elem_mask(env, vm, width, lmul, i)) {
> +                    while (k >= 0) {
> +                        addr = vector_get_index(env, rs1, src2, j, 1, width, 
> k);
> +                        cpu_stb_data(env, addr,
> +                            env->vfp.vreg[dest + k * lmul].s8[j]);

Must probe_access all of the memory before any stores.
Unlike loads, you don't have the option of storing into a temporary.
Which suggests a common subroutine to perform the probe(s), rather
than bother with a temporary for loads.

> +void VECTOR_HELPER(vsuxe_v)(CPURISCVState *env, uint32_t nf, uint32_t vm,
> +    uint32_t rs1, uint32_t rs2, uint32_t rd)
> +{
> +    return VECTOR_HELPER(vsxe_v)(env, nf, vm, rs1, rs2, rd);

You can't do this and expect the GETPC() for the exceptions raised by vsxe_v to
operate properly.  You must define a common helper function and pass in
GETPC(), or preferably not have this second helper function at all.  There's no
reason why you cannot call vsxe_v for implementing vsuxe_v.  It's merely
laziness within the macros you set up in trans_rvv.inc.c.

> +    env->vfp.vstart = 0;
> +}

Dead code after the return, in any case.


r~

Re: [Qemu-devel] [PATCH v2 05/17] RISC-V: add vector extension load and store instructions

Reply via email to