On Thu, Oct 5, 2017 at 4:58 PM, Martin Vignali <martin.vign...@gmail.com> wrote: > Hello, > > In attach patchs to add a dedicated func for clear_block inside > prores decoding (proresdec2) > > currently slice decode func use a loop and call the blockdsp.clear_block > func > > After some test, it seems to be slower, than memset (for me) > I check using this "fake" func in the blockdsp > static void ff_clear_blocks_prores_sse_loop(int16_t * blocks, ptrdiff_t > block_count){ > int i; > for (i = 0; i < block_count; i++) > ff_clear_block_sse(blocks+(i<<6)); > } > > static void ff_clear_blocks_prores_avx_loop(int16_t * blocks, ptrdiff_t > block_count){ > int i; > for (i = 0; i < block_count; i++) > ff_clear_block_avx(blocks+(i<<6)); > } > > the result in checkasm are (need patch in attach to reproduce the test) : > using the loop > blockdsp.clear_blocks_prores_c: 137.8 > blockdsp.clear_blocks_prores_sse: 292.0 > blockdsp.clear_blocks_prores_avx: 230.5 > > > Using the new asm func this is the result (Kaby Lake, os 10.12, Clang 8.1) > blockdsp.clear_blocks_prores_c: 153.4 > blockdsp.clear_blocks_prores_sse: 284.4 > blockdsp.clear_blocks_prores_avx: 142.2 > >
This is still slower then the memset numbers from the first test, why the high variation in there? - Hendrik _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel