On Fri, Oct 09, 2015 at 11:53:40PM +0200, Christophe Gisquet wrote: > Modeled from the prores version. Clips to [0;1023] and is bitexact. > Bitexactness requires to add an offset in a different place compared > to prores or C, and makes the function approximately 2% slower. > > For 16 frames of a DNxHD 4:2:2 10bits test sequence: > > C: 60861 decicycles in idct, 1048205 runs, 371 skips > sse2: 27567 decicycles in idct, 1048216 runs, 360 skips > avx: 26272 decicycles in idct, 1048171 runs, 405 skips > --- > libavcodec/x86/Makefile | 1 + > libavcodec/x86/idctdsp_init.c | 16 ++++++++++ > libavcodec/x86/simple_idct.h | 3 ++ > libavcodec/x86/simple_idct10.asm | 53 > +++++++++++++++++++++++++++++++ > libavcodec/x86/simple_idct10_template.asm | 12 +++++++ > 5 files changed, 85 insertions(+) > create mode 100644 libavcodec/x86/simple_idct10.asm > > diff --git a/libavcodec/x86/Makefile b/libavcodec/x86/Makefile > index a9d8032..ef7628e 100644 > --- a/libavcodec/x86/Makefile > +++ b/libavcodec/x86/Makefile > @@ -126,6 +126,7 @@ YASM-OBJS-$(CONFIG_QPELDSP) += x86/qpeldsp.o > \ > x86/fpel.o \ > x86/qpel.o > YASM-OBJS-$(CONFIG_RV34DSP) += x86/rv34dsp.o > +YASM-OBJS-$(CONFIG_IDCTDSP) += x86/simple_idct10.o > YASM-OBJS-$(CONFIG_VIDEODSP) += x86/videodsp.o > YASM-OBJS-$(CONFIG_VP3DSP) += x86/vp3dsp.o > YASM-OBJS-$(CONFIG_VP8DSP) += x86/vp8dsp.o \ > diff --git a/libavcodec/x86/idctdsp_init.c b/libavcodec/x86/idctdsp_init.c > index 2c26a98..17ddc9e 100644 > --- a/libavcodec/x86/idctdsp_init.c > +++ b/libavcodec/x86/idctdsp_init.c > @@ -85,4 +85,20 @@ av_cold void ff_idctdsp_init_x86(IDCTDSPContext *c, > AVCodecContext *avctx, > c->put_pixels_clamped = ff_put_pixels_clamped_sse2; > c->add_pixels_clamped = ff_add_pixels_clamped_sse2; > } > + > + if (ARCH_X86_64 && > + avctx->bits_per_raw_sample == 10 && avctx->lowres == 0 && > + (avctx->idct_algo == FF_IDCT_AUTO || > + avctx->idct_algo == FF_IDCT_SIMPLEAUTO || > + avctx->idct_algo == FF_IDCT_SIMPLE)) { > + if (EXTERNAL_SSE2(cpu_flags)) { > + c->idct_put = ff_simple_idct10_put_sse2; > + c->perm_type = FF_IDCT_PERM_TRANSPOSE;
perm_type represents the permutation for idct_put, idct_add and idct setting just one of them risks having a wrong permutation for the other 2 if some cases are unused they could be set to NULL to avoid hard to debug artifacts if they become used though setting the to a matching idct seems more correct [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB DNS cache poisoning attacks, popular search engine, Google internet authority dont be evil, please
signature.asc
Description: Digital signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel