On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +/* Vector Single-Width Averaging Add and Subtract */
> +static inline uint8_t get_round(CPURISCVState *env, uint64_t v, uint8_t 
> shift)
> +{
> +    uint8_t d = extract64(v, shift, 1);
> +    uint8_t d1;
> +    uint64_t D1, D2;
> +    int mod = env->vxrm;
> +
> +    if (shift == 0 || shift > 64) {
> +        return 0;
> +    }
> +
> +    d1 = extract64(v, shift - 1, 1);
> +    D1 = extract64(v, 0, shift);
> +    if (mod == 0) { /* round-to-nearest-up (add +0.5 LSB) */
> +        return d1;
> +    } else if (mod == 1) { /* round-to-nearest-even */
> +        if (shift > 1) {
> +            D2 = extract64(v, 0, shift - 1);
> +            return d1 & ((D2 != 0) | d);
> +        } else {
> +            return d1 & d;
> +        }
> +    } else if (mod == 3) { /* round-to-odd (OR bits into LSB, aka "jam") */
> +        return !d & (D1 != 0);
> +    }
> +    return 0; /* round-down (truncate) */
> +}
> +
> +static inline int8_t aadd8(CPURISCVState *env, int8_t a, int8_t b)
> +{
> +    int16_t res = (int16_t)a + (int16_t)b;
> +    uint8_t round = get_round(env, res, 1);
> +    res   = (res >> 1) + round;
> +    return res;
> +}

I think this is a suboptimal way to arrange things.  It leaves the vxrm lookup
inside of the main loop, while it is obviously loop invariant.

I think you should have 4 versions of aadd8, for each of the rounding modes,

> +RVVCALL(OPIVV2_ENV, vaadd_vv_b, OP_SSS_B, H1, H1, H1, aadd8)

then use this, or something like it, to define 4 functions containing main
loops, which will get the helper above inlined.

Then use a final outermost wrapper to select one of the 4 functions based on
env->vxrm.


r~

Reply via email to