Re: [FFmpeg-devel] [PATCH] vf_ssim: x86 simd for ssim_4x4xN and ssim_endN.

2015-07-13 Thread Michael Niedermayer
On Mon, Jul 13, 2015 at 11:39:15PM -0300, James Almer wrote: > On 12/07/15 8:33 PM, Ronald S. Bultje wrote: > > +INIT_XMM sse4 > > +cglobal ssim_end_line, 3, 3, 6, sum0, sum1, w > > +pxor m0, m0 > > +.loop: > > +mova m1, [sum0q+mmsize*0] > > +mova

Re: [FFmpeg-devel] [PATCH] vf_ssim: x86 simd for ssim_4x4xN and ssim_endN.

2015-07-13 Thread James Almer
On 12/07/15 8:33 PM, Ronald S. Bultje wrote: > +INIT_XMM sse4 > +cglobal ssim_end_line, 3, 3, 6, sum0, sum1, w > +pxor m0, m0 > +.loop: > +mova m1, [sum0q+mmsize*0] > +mova m2, [sum0q+mmsize*1] > +mova m3, [sum0q+mmsize*2] > +m

[FFmpeg-devel] [PATCH] vf_ssim: x86 simd for ssim_4x4xN and ssim_endN.

2015-07-12 Thread Ronald S. Bultje
Both are 2-2.5x faster than their C counterpart. --- libavfilter/ssim.h | 36 libavfilter/vf_ssim.c | 26 -- libavfilter/x86/Makefile | 2 + libavfilter/x86/vf_ssim.asm| 191 + libavfilter/x86/vf_ssim_init.c |

[FFmpeg-devel] [PATCH] vf_ssim: x86 simd for ssim_4x4xN and ssim_endN.

2015-07-12 Thread Ronald S. Bultje
Both are 2-2.5x faster than their C counterpart. --- libavfilter/ssim.h | 36 libavfilter/vf_ssim.c | 26 -- libavfilter/x86/Makefile | 2 + libavfilter/x86/vf_ssim.asm| 191 + libavfilter/x86/vf_ssim_init.c |

[FFmpeg-devel] [PATCH] vf_ssim: x86 simd for ssim_4x4xN and ssim_endN.

2015-07-11 Thread Ronald S. Bultje
Both are 2-2.5x faster than their C counterpart. --- libavfilter/ssim.h | 36 libavfilter/vf_ssim.c | 26 -- libavfilter/x86/Makefile | 2 + libavfilter/x86/vf_ssim.asm| 191 + libavfilter/x86/vf_ssim_init.c |