vf_convolution: add sobel filter optimization and unit test with intel AVX512 VNNI

James Almer Mon, 14 Nov 2022 04:54:08 -0800

On 11/4/2022 5:29 AM, bin.wang-at-intel....@ffmpeg.org wrote:

+.loop2:
+    xor  rd, rd
+    pxor m4, m4
+
+    ;Gx
+    SOBEL_MUL 0, data_n1
+    SOBEL_MUL 1, data_n2
+    SOBEL_MUL 2, data_n1
+    SOBEL_ADD 6
+    SOBEL_MUL 7, data_p2
+    SOBEL_ADD 8
+
+    cvtsi2ss xmm4, rd
+    mulss    xmm4, xmm4
+
+    xor rd, rd
+    ;Gy
+    SOBEL_MUL 0, data_n1
+    SOBEL_ADD 2
+    SOBEL_MUL 3, data_n2
+    SOBEL_MUL 5, data_p2
+    SOBEL_MUL 6, data_n1
+    SOBEL_ADD 8
+
+    cvtsi2ss  xmm5, rd
+    fmaddss xmm4, xmm5, xmm5, xmm4
+
+    sqrtps    xmm4, xmm4
+    fmaddss   xmm4, xmm4, xmm0, xmm1     ;sum = sum * rdiv + bias

By using xmm# you're not taking into account any x86inc SWAPing, so thisis using xmm0 and xmm1 where the single scalar float input argumentsreside (at least on unix64), instead of xm0 and xm1 (xmm16 and xmm17)where the broadcasted scalars were stored.This, again, only worked by chance on unix64 because you're using scalarfmadd, and shouldn't work at all on win64.

Also, all these as is are being encoded as VEX, not EVEX, but it shouldbe fine leaving them untouched instead of using xm#, since they will beshorter (five bytes instead of six for some) by using the lower, noncallee-saved regs.

+    cvttps2dq xmm4, xmm4     ; trunc to integer
+    packssdw  xmm4, xmm4
+    packuswb  xmm4, xmm4
+    movd      rd, xmm4
+    mov       [dstq + xq], rb
+
+    add xq, 1
+    cmp xq, widthq
+    jl .loop2
+.end:
+    RET

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v7] libavfilter/x86/vf_convolution: add sobel filter optimization and unit test with intel AVX512 VNNI

Reply via email to