On 2017-02-15 17:55, James Almer wrote: > On 2/13/2017 9:44 AM, James Darnley wrote: >> x86-64 only >> >> Yorkfield: >> - sse2: 2.16x (434 vs. 201 cycles) >> >> Skylake: >> - sse2: 3.04x (378 vs. 124 cycles) >> - avx: 3.29x (378 vs. 115 cycles) >> --- >> libavcodec/x86/h264_deblock.asm | 119 >> ++++++++++++++++++++++++++++++++++++++++ >> libavcodec/x86/h264dsp_init.c | 10 ++++ >> 2 files changed, 129 insertions(+) >> >> diff --git a/libavcodec/x86/h264_deblock.asm >> b/libavcodec/x86/h264_deblock.asm >> index 509a0dbe0c..f47a199e8f 100644 >> --- a/libavcodec/x86/h264_deblock.asm >> +++ b/libavcodec/x86/h264_deblock.asm >> @@ -377,10 +377,129 @@ cglobal deblock_h_luma_8, 5,9,0,0x60+16*WIN64 >> RET >> %endmacro >> >> +; TODO: use macro arguments >> +%macro TRANSPOSE_8X8B_XMM 8 > > Why not put this in x86util? And using arguments, of course. > Also, just call it TRANSPOSE_8X8B.
I could call it that but people might be tempted to think it would work on mmx registers. Perhaps I can add a check for mmsize and error in that case. >> + punpcklbw m0, m1 >> + punpcklbw m2, m3 >> + punpcklbw m4, m5 >> + punpcklbw m6, m7 >> + >> + punpckhwd m1, m0, m2 >> + punpcklwd m0, m2 > > Use SBUTTERFLY here and below. Will do. >> + >> + punpckhwd m5, m4, m6 >> + punpcklwd m4, m6 >> + >> + punpckhdq m2, m0, m4 >> + punpckldq m0, m4 >> + >> + punpckhdq m6, m1, m5 >> + punpckldq m1, m5 >> + >> + MOVHL m4, m0 >> + MOVHL m3, m2 >> + MOVHL m7, m6 >> + MOVHL m5, m1 >> + SWAP 1, 4 >> +%endmacro >> + >> +%macro TRANSPOSE_8X8B_XMM 0 >> + TRANSPOSE_8X8B_XMM 0, 1, 2, 3, 4, 5, 6, 7 > > This seems wrong, or at least superfluous. I had the idea of letting it be a default so I (or other users) wouldn't have to enter 0-7 every time. Perhaps I could define the macro to take zero to seven parameters but that is slightly suboptimal because it would allow one to six parameters to be given >> +%endmacro >> + >> +%macro DEBLOCK_H_LUMA_MBAFF 0 >> + >> +cglobal deblock_h_luma_mbaff_8, 5, 9, 10, 8*16, pix_, stride_, alpha_, >> beta_, tc0_ > > Why the underscores? Because I grew sick of seeing "named", "nameq", etc. So I decided I wanted a separator. I could change x264asm but that seems rather involved. (Ideally I would change nasm/yasm to support unicode so I could use subscripted numbers but that is even harder. Plus I would struggle to enter the characters using my keyboard. Heheheh) >> + movsxd stride_q, stride_d >> + dec alpha_d >> + dec beta_d >> + mov r5, pix_q >> + lea r6, [3*stride_q] > > Call r6 stride3. I'll consider it. Thanks for the comments. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel