Hello, In attach patchs to add a dedicated func for clear_block inside prores decoding (proresdec2)
currently slice decode func use a loop and call the blockdsp.clear_block func After some test, it seems to be slower, than memset (for me) I check using this "fake" func in the blockdsp static void ff_clear_blocks_prores_sse_loop(int16_t * blocks, ptrdiff_t block_count){ int i; for (i = 0; i < block_count; i++) ff_clear_block_sse(blocks+(i<<6)); } static void ff_clear_blocks_prores_avx_loop(int16_t * blocks, ptrdiff_t block_count){ int i; for (i = 0; i < block_count; i++) ff_clear_block_avx(blocks+(i<<6)); } the result in checkasm are (need patch in attach to reproduce the test) : using the loop blockdsp.clear_blocks_prores_c: 137.8 blockdsp.clear_blocks_prores_sse: 292.0 blockdsp.clear_blocks_prores_avx: 230.5 Using the new asm func this is the result (Kaby Lake, os 10.12, Clang 8.1) blockdsp.clear_blocks_prores_c: 153.4 blockdsp.clear_blocks_prores_sse: 284.4 blockdsp.clear_blocks_prores_avx: 142.2 Pass fate test for me (X86_64) Like the block_per_slice value in prores decoder, is multiply by 2 or 4 (depend of the codec), the asm function can process two blocks in the same loop (in AVX) I also put in attach a patch to fix comment, for clear_block dsp func, (need 32 aligned now because of avx) (to avoid a "dedicated" thread on the mailing list) Martin Jokyo Images
0001-libavcodec-blockdsp-fix-comment.-clear_block-need-32.patch
Description: Binary data
0002-libavcodec-blockdsp-add-clear_block_prores.patch
Description: Binary data
0003-libavcodec-proresdec2-use-clear_blocks_prores-for-ea.patch
Description: Binary data
0004-libavcodec-blockdsp-cosmetic-indent.patch
Description: Binary data
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel