Ronald S. Bultje: > 发件人: Ronald S. Bultje <rsbul...@gmail.com> > 发送时间: 2024年5月29日 13:56 > 收件人: Wu Jianhua > 抄送: FFmpeg development discussions and patches; Nuo Mi; James Almer > 主题: Re: [FFmpeg-devel] [PATCH 1/3] avcodec/x86/vvc/vvc_alf: fix integer > overflow > > Hi, > > On Wed, May 29, 2024 at 3:44 PM Wu Jianhua > <toq...@outlook.com<mailto:toq...@outlook.com>> wrote: > Ronald S. Bultje: >> On Wed, May 29, 2024 at 11:38 AM >> <toq...@outlook.com<mailto:toq...@outlook.com>> >> <mailto:toq...@outlook.com<mailto:toq...@outlook.com>>> wrote: >> +%else >> + vpunpcklqdq m11, m2, m2 >> + vpunpckhqdq m12, m2, m2 >> + vpunpcklwd m11, m11, m14 >> + vpunpcklwd m12, m12, m14 >> + paddd m0, m11 >> + paddd m1, m12 >> + packssdw m0, m0, m1 >> +%endif > > [..] > > Also, the whole thing just emulates a saturated add. Can't you use paddsw > > instead of paddw and be done with it? To add to Andreas' question: is >> > > saturating here normatively required? > > > We didn't have any sample that failed for this issue except for the > > checksum with specific seeds. I think we can keep not changing it until a > > real sample has something wrong. > > @Nuomi to get more details. > > I think "just" replacing paddw with paddsw is correct, since the input pixels > are 12bit (so they could be either unsigned or signed), the filtered output > > is the result of packssdw (so signed words), and the desired output is 12bit > pixels anyway, anything greater than that is clipped to 12bit range. So to > > me, it seems paddsw is a cheaper way to accomplish the same thing. > > Ronald
Hi Ronald, Yes, it does. I've test paddsw and everything works well. It must be a cheaper way to get minimum performance loss. And v2 sent. Thanks for this. Jianhua _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".