On 4/18/2016 6:25 PM, Christophe Gisquet wrote: > 2016-04-18 21:18 GMT+02:00 Michael Niedermayer <mich...@niedermayer.cc>: >> > this breaks (only noise) >> > \[CCCP\]_Mega_Weird_Audio_Test.mkv track 23 > Worthwhile sample. > > I rewrote the patch to reduce code duplication, and I fixed the issue > (misread a shift). > > -- Christophe > > > 0005-x86-lossless-audio-SSE4-madd-32bits.patch > > > From a0d4a96c032d73bc0e34fec320497aefafba3c28 Mon Sep 17 00:00:00 2001 > From: Christophe Gisquet <christophe.gisq...@gmail.com> > Date: Mon, 18 Apr 2016 13:20:07 +0200 > Subject: [PATCH 5/7] x86: lossless audio: SSE4 madd 32bits > > The unique user so far is wmalossless 24bits. The few samples tested show an > order of 8, so more unrolling or an avx2 version do not make sense. > > Timings: 72 -> 49 cycles > --- > libavcodec/x86/lossless_audiodsp.asm | 31 +++++++++++++++++++++++++------ > libavcodec/x86/lossless_audiodsp_init.c | 7 +++++++ > 2 files changed, 32 insertions(+), 6 deletions(-) > > diff --git a/libavcodec/x86/lossless_audiodsp.asm > b/libavcodec/x86/lossless_audiodsp.asm > index 5597dad..d00869b 100644 > --- a/libavcodec/x86/lossless_audiodsp.asm > +++ b/libavcodec/x86/lossless_audiodsp.asm > @@ -22,13 +22,17 @@ > > SECTION .text > > -%macro SCALARPRODUCT 0 > +%macro SCALARPRODUCT 1 > ; int ff_scalarproduct_and_madd_int16(int16_t *v1, int16_t *v2, int16_t *v3, > ; int order, int mul) > -cglobal scalarproduct_and_madd_int16, 4,4,8, v1, v2, v3, order, mul > - shl orderq, 1 > +; int ff_scalarproduct_and_madd_int32(int32_t *v1, int32_t *v2, int32_t *v3, > +; int order, int mul) > +cglobal scalarproduct_and_madd_int %+ %1, 4,4,8, v1, v2, v3, order, mul > + shl orderq, (%1/16)
order is int, so maybe it would be better to use orderd here, to make sure the upper half of the register is cleared on x86_64. Wonder why it was never an issue until now, though. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel