On 1/14/2016 11:05 PM, James Darnley wrote: > 2.6 times faster > --- > I have one question now. Should I make the function name match the assembly > existing deblock/loop filter functions? I took the current name from the C > (as > I was originally trying to use a gather instruction but that didn't offer any > benefit). > --- > libavcodec/x86/h264_deblock.asm | 40 ++++++++++++++++++++++++++++++++++++++++ > libavcodec/x86/h264dsp_init.c | 4 ++++ > 2 files changed, 44 insertions(+) > > diff --git a/libavcodec/x86/h264_deblock.asm b/libavcodec/x86/h264_deblock.asm > index 5151f3c..20f0814 100644 > --- a/libavcodec/x86/h264_deblock.asm > +++ b/libavcodec/x86/h264_deblock.asm > @@ -864,7 +864,47 @@ ff_chroma_inter_body_mmxext: > DEBLOCK_P0_Q0 > ret > > +cglobal h264_h_loop_filter_chroma422_8, 5, 7, 8, mmsize + > ARCH_X86_64*2*mmsize
This will not work with x86_32 compilers that don't have aligned stack (Like msvc) because r6 is needed to store the stack pointer. > + %if ARCH_X86_64 > + %define buf0 [rsp+16] > + %define buf1 [rsp+8] > + %else > + %define buf0 r0m > + %define buf1 r2m > + %endif > + > + movd m6, [r4] Since r4 is free after this point, you can use it instead of r6 to easily solve the above. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel