> Don't clobber v8 here. > Use vsub.vv here to avoid the sequential dependency.
Updated. <uk7b-at-foxmail....@ffmpeg.org> 于2024年12月21日周六 20:22写道: > From: sunyuechi <sunyue...@iscas.ac.cn> > > --- > libavcodec/riscv/vvc/vvc_sad_rvv.S | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/libavcodec/riscv/vvc/vvc_sad_rvv.S > b/libavcodec/riscv/vvc/vvc_sad_rvv.S > index 341167be1f..f325deee17 100644 > --- a/libavcodec/riscv/vvc/vvc_sad_rvv.S > +++ b/libavcodec/riscv/vvc/vvc_sad_rvv.S > @@ -36,20 +36,20 @@ func ff_vvc_sad_rvv_\vlen, zve32x, zbb, zba > SADVSET\vlen\w: > vsetvlstatic32 \w, \vlen > vmv.v.i v0, 0 > - vmv.s.x v24, zero > vsetvlstatic16 \w, \vlen > SAD\vlen\w: > addi a5, a5, -2 > vle16.v v8, (a0) > vle16.v v16, (a1) > - vsub.vv v8, v8, v16 > - vneg.v v16, v8 > + vsub.vv v24, v8, v16 > + vsub.vv v16, v16, v8 > addi a0, a0, 2 * 128 * 2 > - vmax.vv v8, v8, v16 > - vwaddu.wv v0, v0, v8 > + vmax.vv v8, v24, v16 > addi a1, a1, 2 * 128 * 2 > + vwaddu.wv v0, v0, v8 > bnez a5, SAD\vlen\w > vsetvlstatic32 \w, \vlen > + vmv.s.x v24, zero > vredsum.vs v24, v0, v24 > vmv.x.s a0, v24 > ret > -- > 2.47.1 > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".