On 6/27/17, James Almer <jamr...@gmail.com> wrote: > Signed-off-by: James Almer <jamr...@gmail.com> > --- > libavfilter/x86/vf_blend.asm | 25 +++++++++++++++++++++++++ > libavfilter/x86/vf_blend_init.c | 4 ++++ > tests/checkasm/vf_blend.c | 1 + > 3 files changed, 30 insertions(+) > > diff --git a/libavfilter/x86/vf_blend.asm b/libavfilter/x86/vf_blend.asm > index 33b1ad1496..25f6f5affc 100644 > --- a/libavfilter/x86/vf_blend.asm > +++ b/libavfilter/x86/vf_blend.asm > @@ -286,6 +286,31 @@ BLEND_INIT difference, 3 > jl .loop > BLEND_END > > +BLEND_INIT extremity, 8 > + pxor m2, m2 > + mova m4, [pw_255] > +.nextrow: > + mov xq, widthq > + > + .loop: > + movu m0, [topq + xq] > + movu m1, [bottomq + xq] > + punpckhbw m5, m0, m2 > + punpcklbw m0, m2 > + punpckhbw m6, m1, m2 > + punpcklbw m1, m2 > + psubw m3, m4, m0 > + psubw m7, m4, m5 > + psubw m3, m1 > + psubw m7, m6 > + ABS1 m3, m1 > + ABS1 m7, m6
Minor nitpick. There exists ABS2 that takes 4 parameters and that does two interleaved ABS1 , that are (hopefully) faster on sse2. It should generate exactly the same code on ssse3. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel