On 6/27/2017 8:19 PM, Ivan Kalvachev wrote: > On 6/27/17, James Almer <jamr...@gmail.com> wrote: >> Signed-off-by: James Almer <jamr...@gmail.com> >> --- >> libavfilter/x86/vf_blend.asm | 25 +++++++++++++++++++++++++ >> libavfilter/x86/vf_blend_init.c | 4 ++++ >> tests/checkasm/vf_blend.c | 1 + >> 3 files changed, 30 insertions(+) >> >> diff --git a/libavfilter/x86/vf_blend.asm b/libavfilter/x86/vf_blend.asm >> index 33b1ad1496..25f6f5affc 100644 >> --- a/libavfilter/x86/vf_blend.asm >> +++ b/libavfilter/x86/vf_blend.asm >> @@ -286,6 +286,31 @@ BLEND_INIT difference, 3 >> jl .loop >> BLEND_END >> >> +BLEND_INIT extremity, 8 >> + pxor m2, m2 >> + mova m4, [pw_255] >> +.nextrow: >> + mov xq, widthq >> + >> + .loop: >> + movu m0, [topq + xq] >> + movu m1, [bottomq + xq] >> + punpckhbw m5, m0, m2 >> + punpcklbw m0, m2 >> + punpckhbw m6, m1, m2 >> + punpcklbw m1, m2 >> + psubw m3, m4, m0 >> + psubw m7, m4, m5 >> + psubw m3, m1 >> + psubw m7, m6 >> + ABS1 m3, m1 >> + ABS1 m7, m6 > > Minor nitpick. > > There exists ABS2 that takes 4 parameters and that does > two interleaved ABS1 , that are (hopefully) faster on sse2. > It should generate exactly the same code on ssse3.
Ah nice, pushed a change to use them. Thanks. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel