On 12/23/2017 5:44 PM, Aurelien Jacobs wrote: > On Sat, Dec 23, 2017 at 03:35:28PM -0300, James Almer wrote: >> On 12/23/2017 3:01 PM, Aurelien Jacobs wrote: >>> This was originally based on libsbc, and was fully integrated into ffmpeg. >>> >>> Rough speed test: >>> C version: speed= 592x >>> MMX version: speed= 785x >>> --- >>> libavcodec/sbcdsp.c | 3 + >>> libavcodec/sbcdsp.h | 2 + >>> libavcodec/x86/Makefile | 2 + >>> libavcodec/x86/sbcdsp.asm | 284 >>> +++++++++++++++++++++++++++++++++++++++++++ >>> libavcodec/x86/sbcdsp_init.c | 51 ++++++++ >>> 5 files changed, 342 insertions(+) >>> create mode 100644 libavcodec/x86/sbcdsp.asm >>> create mode 100644 libavcodec/x86/sbcdsp_init.c >> >> [...] >> >>> +;******************************************************************* >>> +;void ff_sbc_calc_scalefactors(int32_t sb_sample_f[16][2][8], >>> +; uint32_t scale_factor[2][8], >>> +; int blocks, int channels, int subbands) >>> +;******************************************************************* >>> +INIT_MMX mmx >>> +cglobal sbc_calc_scalefactors, 5, 7, 3, sb_sample_f, scale_factor, blocks, >>> channels, subbands, ptr, blk >>> + ; subbands = 4 * subbands * channels >>> + shl subbandsd, 2 >>> + cmp channelsd, 2 >>> + jl .loop_1 >>> + shl subbandsd, 1 >>> + >>> +.loop_1: >>> + sub subbandsq, 8 >>> + lea ptrq, [sb_sample_fq + subbandsq] >>> + >>> + ; blk = (blocks - 1) * 64; >>> + lea blkq, [blocksq - 1] >>> + shl blkd, 6 >>> + >>> + movq m0, [scale_mask] >> >> I insist, this can be easily loaded outside the loop. You have enough >> spare regs to store a copy. > > Oh, I forgot to reply to this. There isn't any register left available > on x86_32, hence why I kept those load inside the loop.
You're not using a gprs to store the mask nor need to. You're using mmx regs and have 5 left. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel