> -----Original Message----- > From: ffmpeg-devel <ffmpeg-devel-boun...@ffmpeg.org> On Behalf Of > chen > Sent: Wednesday, December 4, 2019 9:36 AM > To: FFmpeg development discussions and patches <ffmpeg- > de...@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH 3/3] avfilter/vf_convolution: add X86 > SIMD for filter_column() > > > > At 2019-12-04 08:59:08, "Song, Ruiling" <ruiling.s...@intel.com> wrote: > >> -----Original Message----- > >> From: ffmpeg-devel <ffmpeg-devel-boun...@ffmpeg.org> On Behalf Of > >> chen > >> Sent: Tuesday, December 3, 2019 4:59 PM > >> To: FFmpeg development discussions and patches <ffmpeg- > >> de...@ffmpeg.org> > >> Subject: Re: [FFmpeg-devel] [PATCH 3/3] avfilter/vf_convolution: add X86 > >> SIMD for filter_column() > >> > >> comments inline in code > >> > >> > >> At 2019-12-03 15:52:07, xuju...@sjtu.edu.cn wrote: > >> >From: Xu Jun <xuju...@sjtu.edu.cn> > >[...] > >> >+ > >> >+ cvtdq2ps m4, m4 > >> >+ mulps m4, m0 ; sum *= rdiv > >> >+ addps m4, m1 ; sum += bias > >> > >> >+ addps m4, m5 ; sum += 0.5 > >> I don't know how about precision mismatch if we pre-compute (bias+0.5) > > >I think it is hard to prove it is safe to do pre-compute. > Agree, I also worried precision issue since float operator is execute order > dependent. > How about ROUNDPS? Seems no exactly match. > > > > > >> > >> > >> >+ cvttps2dq m4, m4 > >> >+ packssdw m4, m4 > >> >+ packuswb m4, m4 > >> >+ movss [dstq + dst_offq], m4 > >> >+ add c_offq, mmsize/4 > >> >+ add dst_offq, mmsize/4 > >> >+ > >> >+ add off16q, mmsize/4 > >> >+ cmp off16q, widthq > >> >+ jl .loop16 > >> >+ > >> >+ add widthq, rq > >> >+ cmp off16q, widthq > >> >+ jge .paraend > >> >+ > >> > >> >+ .loopr: > >> no idea about this loop, if we can read beyond, we can reuse above SIMD > >> code > >Reuse above SIMD code may write to the memory that does not belong to > this slice-thread. > > >IMO, the code to handle remainder columns is still necessary. > > > Depends on algorithm & size, > For example width=23 > Process #0 [0:15] > Process #1 [7:22] > Both of them is multiple of 16 Sounds interesting. But FFmpeg does not do like this now. One question is will this get a penalty for writing to same address of memory (both are writing to 7-15) from different threads?
> > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".