On 3/12/20 7:58 AM, LIU Zhiwei wrote: > +/* Vector Single-Width Averaging Add and Subtract */ > +static inline uint8_t get_round(CPURISCVState *env, uint64_t v, uint8_t > shift) > +{ > + uint8_t d = extract64(v, shift, 1); > + uint8_t d1; > + uint64_t D1, D2; > + int mod = env->vxrm; > + > + if (shift == 0 || shift > 64) { > + return 0; > + } > + > + d1 = extract64(v, shift - 1, 1); > + D1 = extract64(v, 0, shift); > + if (mod == 0) { /* round-to-nearest-up (add +0.5 LSB) */ > + return d1; > + } else if (mod == 1) { /* round-to-nearest-even */ > + if (shift > 1) { > + D2 = extract64(v, 0, shift - 1); > + return d1 & ((D2 != 0) | d); > + } else { > + return d1 & d; > + } > + } else if (mod == 3) { /* round-to-odd (OR bits into LSB, aka "jam") */ > + return !d & (D1 != 0); > + } > + return 0; /* round-down (truncate) */ > +} > + > +static inline int8_t aadd8(CPURISCVState *env, int8_t a, int8_t b) > +{ > + int16_t res = (int16_t)a + (int16_t)b; > + uint8_t round = get_round(env, res, 1); > + res = (res >> 1) + round; > + return res; > +}
I think this is a suboptimal way to arrange things. It leaves the vxrm lookup inside of the main loop, while it is obviously loop invariant. I think you should have 4 versions of aadd8, for each of the rounding modes, > +RVVCALL(OPIVV2_ENV, vaadd_vv_b, OP_SSS_B, H1, H1, H1, aadd8) then use this, or something like it, to define 4 functions containing main loops, which will get the helper above inlined. Then use a final outermost wrapper to select one of the 4 functions based on env->vxrm. r~