On Sun, 14 Jan 2018, Henrik Gramner wrote:
On Sat, Jan 13, 2018 at 10:57 PM, Marton Balint <c...@passwd.hu> wrote:
+ .loop:
+ movu m0, [src1q + xq]
+ movu m1, [src2q + xq]
+ punpckl%1%2 m5, m0, m2 ; 0e0f0g0h
+ punpckh%1%2 m0, m2 ; 0a0b0c0d
+ punpckl%1%2 m6, m1, m2 ; 0E0F0G0H
+ punpckh%1%2 m1, m2 ; 0A0B0C0D
+ pmull%2 m0, m3
+ pmull%2 m5, m3
+ pmull%2 m1, m4
+ pmull%2 m6, m4
+ padd%2 m0, m7
+ padd%2 m5, m7
+ padd%2 m0, m1
+ padd%2 m5, m6
pmaddubsw should work here for the 8-bit case. pmaddwd might work for
the 16-bit case depending on how many bits are actually used.
As far as I see, I have to make the blending factors 7-bit (15-bit) in
order for this to work because pmadd* functions are working on signed
integers. Losing 1 bit of precision of the blending factors is
probably not a problem for the framerate filter.
So my loop would look like this:
.loop:
movu m0, [src1q + xq]
movu m1, [src2q + xq]
SBUTTERFLY %1%2, 0, 1, 5 ; aAbBcCdD
; eEfFgGhH
pmadd%3 m0, m3
pmadd%3 m1, m3
padd%2 m0, m7
padd%2 m1, m7
psrl%2 m0, %4 ; 0A0B0C0D
psrl%2 m1, %4 ; 0E0F0G0H
packus%2%1 m0, m1 ; ABCDEFGH
movu [dstq + xq], m0
add xq, mmsize
jl .loop
Is this what you had in mind?
+ pinsrw xm3, r8m, 0 ; factor1
+ pinsrw xm4, r9m, 0 ; factor2
+ pinsrw xm7, r10m, 0 ; half
+ SPLATW m3, xm3
+ SPLATW m4, xm4
+ SPLATW m7, xm7
vpbroadcast* from memory on avx2, otherwise movd instead of pxor+pinsrw.
+ pxor m3, m3
+ pxor m4, m4
+ pxor m7, m7
+ pinsrw xm3, r8m, 0 ; factor1
+ pinsrw xm4, r9m, 0 ; factor2
+ pinsrw xm7, r10m, 0 ; half
+ XSPLATD 3
+ XSPLATD 4
+ XSPLATD 7
Ditto.
+ neg word r11m ; shift = -shift
+ add word r11m, 16 ; shift += 16
+ pxor m2, m2
+ pinsrw xm2, r11m, 0 ; 16 - shift
+ pslld m3, xm2
+ pslld m4, xm2
+ pslld m7, xm2
You probably want to use a temporary register instead of doing slow
load-modify-store instructions.
Ok, I will rework these, although these parts are only the initialization
code, so I guess these are not performance critical.
Thanks,
Marton
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel