Re: [FFmpeg-devel] [PATCH] vp9: add 16x16 idct avx2 (8-bit).

2016-07-13 Thread Ronald S. Bultje
Hi, On Mon, Jul 11, 2016 at 6:15 PM, Henrik Gramner wrote: > On Mon, Jul 11, 2016 at 11:48 PM, Carl Eugen Hoyos > wrote: > > Ronald S. Bultje gmail.com> writes: > > > >> +%if ARCH_X86_64 > > > > Just curious: Why does this not work on x86-32? > > Isn't there some asm magic that moves some > >

Re: [FFmpeg-devel] [PATCH] vp9: add 16x16 idct avx2 (8-bit).

2016-07-11 Thread Henrik Gramner
On Mon, Jul 11, 2016 at 11:48 PM, Carl Eugen Hoyos wrote: > Ronald S. Bultje gmail.com> writes: > >> +%if ARCH_X86_64 > > Just curious: Why does this not work on x86-32? > Isn't there some asm magic that moves some > parameters to the stack if necessary? > > Carl Eugen Uses more than 8 vector re

Re: [FFmpeg-devel] [PATCH] vp9: add 16x16 idct avx2 (8-bit).

2016-07-11 Thread Carl Eugen Hoyos
Ronald S. Bultje gmail.com> writes: > +%if ARCH_X86_64 Just curious: Why does this not work on x86-32? Isn't there some asm magic that moves some parameters to the stack if necessary? Carl Eugen ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org

Re: [FFmpeg-devel] [PATCH] vp9: add 16x16 idct avx2 (8-bit).

2016-07-11 Thread Ronald S. Bultje
Hi, On Sat, Jul 9, 2016 at 11:12 AM, James Almer wrote: > On 7/8/2016 6:59 PM, Ronald S. Bultje wrote: > > +%if ARCH_X86_64 > > +INIT_YMM avx2 > > Add an %if HAVE_AVX2_EXTERNAL check here, because yasm 1.1.0 and older > don't support avx2. > > lgtm aside from that. Changed, and pushed. Ronald

Re: [FFmpeg-devel] [PATCH] vp9: add 16x16 idct avx2 (8-bit).

2016-07-09 Thread James Almer
On 7/8/2016 6:59 PM, Ronald S. Bultje wrote: > +%if ARCH_X86_64 > +INIT_YMM avx2 Add an %if HAVE_AVX2_EXTERNAL check here, because yasm 1.1.0 and older don't support avx2. lgtm aside from that. > +cglobal vp9_idct_idct_16x16_add, 4, 4, 16, dst, stride, block, eob ___

[FFmpeg-devel] [PATCH] vp9: add 16x16 idct avx2 (8-bit).

2016-07-08 Thread Ronald S. Bultje
checkasm --bench, 10k runs, for *_add_${bpc}_${sub_idct}_${opt}, shows that it's about 1.65x as fast as the AVX version for the full IDCT, and similar speedups for the sub-IDCTs: nop: 24.6 vp9_inv_dct_dct_16x16_add_8_1_c: 6444.8 vp9_inv_dct_dct_16x16_add_8_1_sse2: 638.6 vp9_inv_dct_dct_16x16_add_8

Re: [FFmpeg-devel] [PATCH] vp9: add 16x16 idct avx2 (8-bit).

2016-07-08 Thread Michael Niedermayer
On Fri, Jul 08, 2016 at 04:40:28PM -0400, Ronald S. Bultje wrote: > checkasm --bench, 10k runs, for *_add_${bpc}_${sub_idct}_${opt}, shows > that it's about 1.65x as fast as the AVX version for the full IDCT, and > similar speedups for the sub-IDCTs: > > nop: 24.6 > vp9_inv_dct_dct_16x16_add_8_1_c

[FFmpeg-devel] [PATCH] vp9: add 16x16 idct avx2 (8-bit).

2016-07-08 Thread Ronald S. Bultje
checkasm --bench, 10k runs, for *_add_${bpc}_${sub_idct}_${opt}, shows that it's about 1.65x as fast as the AVX version for the full IDCT, and similar speedups for the sub-IDCTs: nop: 24.6 vp9_inv_dct_dct_16x16_add_8_1_c: 6444.8 vp9_inv_dct_dct_16x16_add_8_1_sse2: 638.6 vp9_inv_dct_dct_16x16_add_8