On Mon, Jul 13, 2015 at 11:39:15PM -0300, James Almer wrote:
> On 12/07/15 8:33 PM, Ronald S. Bultje wrote:
> > +INIT_XMM sse4
> > +cglobal ssim_end_line, 3, 3, 6, sum0, sum1, w
> > +pxor m0, m0
> > +.loop:
> > +mova m1, [sum0q+mmsize*0]
> > +mova
On 12/07/15 8:33 PM, Ronald S. Bultje wrote:
> +INIT_XMM sse4
> +cglobal ssim_end_line, 3, 3, 6, sum0, sum1, w
> +pxor m0, m0
> +.loop:
> +mova m1, [sum0q+mmsize*0]
> +mova m2, [sum0q+mmsize*1]
> +mova m3, [sum0q+mmsize*2]
> +m
Both are 2-2.5x faster than their C counterpart.
---
libavfilter/ssim.h | 36
libavfilter/vf_ssim.c | 26 --
libavfilter/x86/Makefile | 2 +
libavfilter/x86/vf_ssim.asm| 191 +
libavfilter/x86/vf_ssim_init.c |
Both are 2-2.5x faster than their C counterpart.
---
libavfilter/ssim.h | 36
libavfilter/vf_ssim.c | 26 --
libavfilter/x86/Makefile | 2 +
libavfilter/x86/vf_ssim.asm| 191 +
libavfilter/x86/vf_ssim_init.c |
Both are 2-2.5x faster than their C counterpart.
---
libavfilter/ssim.h | 36
libavfilter/vf_ssim.c | 26 --
libavfilter/x86/Makefile | 2 +
libavfilter/x86/vf_ssim.asm| 191 +
libavfilter/x86/vf_ssim_init.c |