vf_convolution: add X86 SIMD for filter_column()

chen Tue, 03 Dec 2019 17:37:45 -0800


At 2019-12-04 08:59:08, "Song, Ruiling" <ruiling.s...@intel.com> wrote:
>> -----Original Message-----
>> From: ffmpeg-devel <ffmpeg-devel-boun...@ffmpeg.org> On Behalf Of
>> chen
>> Sent: Tuesday, December 3, 2019 4:59 PM
>> To: FFmpeg development discussions and patches <ffmpeg-
>> de...@ffmpeg.org>
>> Subject: Re: [FFmpeg-devel] [PATCH 3/3] avfilter/vf_convolution: add X86
>> SIMD for filter_column()
>> 
>> comments inline in code
>> 
>> 
>> At 2019-12-03 15:52:07, xuju...@sjtu.edu.cn wrote:
>> >From: Xu Jun <xuju...@sjtu.edu.cn>
>[...]
>> >+
>> >+        cvtdq2ps m4, m4
>> >+        mulps m4, m0     ; sum *= rdiv
>> >+        addps m4, m1     ; sum += bias
>> 
>> >+        addps m4, m5     ; sum += 0.5
>> I don't know how about precision mismatch if we pre-compute (bias+0.5)


>I think it is hard to prove it is safe to do pre-compute.
Agree, I also worried precision issue since float operator is execute order 
dependent.
How about ROUNDPS?


>
>> 
>> 
>> >+        cvttps2dq m4, m4
>> >+        packssdw m4, m4
>> >+        packuswb m4, m4
>> >+        movss [dstq + dst_offq], m4
>> >+        add c_offq, mmsize/4
>> >+        add dst_offq, mmsize/4
>> >+
>> >+        add off16q, mmsize/4
>> >+        cmp off16q, widthq
>> >+        jl .loop16
>> >+
>> >+    add widthq, rq
>> >+    cmp off16q, widthq
>> >+    jge .paraend
>> >+
>> 
>> >+    .loopr:
>> no idea about this loop, if we can read beyond, we can reuse above SIMD
>> code
>Reuse above SIMD code may write to the memory that does not belong to this 
>slice-thread.

>IMO, the code to handle remainder columns is still necessary.


Depends on algorithm & size,
For example width=23
Process #0 [0:15]
Process #1 [7:22]
Both of them is multiple of 16

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 3/3] avfilter/vf_convolution: add X86 SIMD for filter_column()

Reply via email to