At 2019-12-04 08:59:08, "Song, Ruiling" <ruiling.s...@intel.com> wrote: >> -----Original Message----- >> From: ffmpeg-devel <ffmpeg-devel-boun...@ffmpeg.org> On Behalf Of >> chen >> Sent: Tuesday, December 3, 2019 4:59 PM >> To: FFmpeg development discussions and patches <ffmpeg- >> de...@ffmpeg.org> >> Subject: Re: [FFmpeg-devel] [PATCH 3/3] avfilter/vf_convolution: add X86 >> SIMD for filter_column() >> >> comments inline in code >> >> >> At 2019-12-03 15:52:07, xuju...@sjtu.edu.cn wrote: >> >From: Xu Jun <xuju...@sjtu.edu.cn> >[...] >> >+ >> >+ cvtdq2ps m4, m4 >> >+ mulps m4, m0 ; sum *= rdiv >> >+ addps m4, m1 ; sum += bias >> >> >+ addps m4, m5 ; sum += 0.5 >> I don't know how about precision mismatch if we pre-compute (bias+0.5)
>I think it is hard to prove it is safe to do pre-compute. Agree, I also worried precision issue since float operator is execute order dependent. How about ROUNDPS? > >> >> >> >+ cvttps2dq m4, m4 >> >+ packssdw m4, m4 >> >+ packuswb m4, m4 >> >+ movss [dstq + dst_offq], m4 >> >+ add c_offq, mmsize/4 >> >+ add dst_offq, mmsize/4 >> >+ >> >+ add off16q, mmsize/4 >> >+ cmp off16q, widthq >> >+ jl .loop16 >> >+ >> >+ add widthq, rq >> >+ cmp off16q, widthq >> >+ jge .paraend >> >+ >> >> >+ .loopr: >> no idea about this loop, if we can read beyond, we can reuse above SIMD >> code >Reuse above SIMD code may write to the memory that does not belong to this >slice-thread. >IMO, the code to handle remainder columns is still necessary. Depends on algorithm & size, For example width=23 Process #0 [0:15] Process #1 [7:22] Both of them is multiple of 16 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".