Hi, On Thu, Jan 14, 2016 at 9:55 PM, James Almer <jamr...@gmail.com> wrote:
> On 1/14/2016 11:05 PM, James Darnley wrote: > > 2.6 times faster > > --- > > I have one question now. Should I make the function name match the > assembly > > existing deblock/loop filter functions? I took the current name from > the C (as > > I was originally trying to use a gather instruction but that didn't > offer any > > benefit). > > --- > > libavcodec/x86/h264_deblock.asm | 40 > ++++++++++++++++++++++++++++++++++++++++ > > libavcodec/x86/h264dsp_init.c | 4 ++++ > > 2 files changed, 44 insertions(+) > > > > diff --git a/libavcodec/x86/h264_deblock.asm > b/libavcodec/x86/h264_deblock.asm > > index 5151f3c..20f0814 100644 > > --- a/libavcodec/x86/h264_deblock.asm > > +++ b/libavcodec/x86/h264_deblock.asm > > @@ -864,7 +864,47 @@ ff_chroma_inter_body_mmxext: > > DEBLOCK_P0_Q0 > > ret > > > > +cglobal h264_h_loop_filter_chroma422_8, 5, 7, 8, mmsize + > ARCH_X86_64*2*mmsize > > This will not work with x86_32 compilers that don't have aligned stack > (Like msvc) > because r6 is needed to store the stack pointer. If you don't need r%dm (looks like you don't, but didn't check exhaustively), you can also use a negative stack size (0 - mmsize - ARCH_X86_64 * 2 * mmsize), then it will not create a stack pointer. Ronald _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel