> >> This is still slower then the memset numbers from the first test, why >> the high variation in there? >> >> > Hello,
Maybe the result in my first email was not very clear For the results below i run the checkasm test 10 times in each case and take the faster. Original benchmark (similar to the current way in the proresdec) using these func static void clear_blocks_prores_c(int16_t * blocks, ptrdiff_t block_count) { int i; for (i = 0; i < block_count; i++) { memset(blocks+(i << 6), 0, sizeof(int16_t) * 64); } } static void ff_clear_blocks_prores_sse(int16_t * blocks, ptrdiff_t block_count){ int i; for (i = 0; i < block_count; i++) ff_clear_block_sse(blocks+(i<<6)); } static void ff_clear_blocks_prores_avx(int16_t * blocks, ptrdiff_t block_count){ int i; for (i = 0; i < block_count; i++) ff_clear_block_avx(blocks+(i<<6)); } blockdsp.clear_blocks_prores_c: 570.3 blockdsp.clear_blocks_prores_sse: 325.8 blockdsp.clear_blocks_prores_avx: 190.3 new version blockdsp.clear_blocks_prores_c: 138.3 blockdsp.clear_blocks_prores_sse: 274.6 blockdsp.clear_blocks_prores_avx: 137.6 with the new patch using for the c version static void clear_blocks_prores_c(int16_t * blocks, ptrdiff_t block_count) { memset(blocks, 0, sizeof(int16_t) * 64 * block_count); } Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel