Hello,

In attach patchs to add a dedicated func for clear_block inside
prores decoding (proresdec2)

currently slice decode func use a loop and call the blockdsp.clear_block
func

After some test, it seems to be slower, than memset (for me)
I check using this "fake" func in the blockdsp
static void ff_clear_blocks_prores_sse_loop(int16_t * blocks, ptrdiff_t
block_count){
    int i;
    for (i = 0; i < block_count; i++)
        ff_clear_block_sse(blocks+(i<<6));
}

static void ff_clear_blocks_prores_avx_loop(int16_t * blocks, ptrdiff_t
block_count){
    int i;
    for (i = 0; i < block_count; i++)
        ff_clear_block_avx(blocks+(i<<6));
}

the result in checkasm are (need patch in attach to reproduce the test) :
using the loop
blockdsp.clear_blocks_prores_c: 137.8
blockdsp.clear_blocks_prores_sse: 292.0
blockdsp.clear_blocks_prores_avx: 230.5


Using the new asm func this is the result (Kaby Lake, os 10.12, Clang 8.1)
blockdsp.clear_blocks_prores_c: 153.4
blockdsp.clear_blocks_prores_sse: 284.4
blockdsp.clear_blocks_prores_avx: 142.2

Pass fate test for me (X86_64)

Like the block_per_slice value in prores decoder, is multiply by 2 or 4
(depend of the codec), the asm function
can process two blocks in the same loop (in AVX)

I also put in attach a patch to fix comment, for clear_block dsp func,
(need 32 aligned now because of avx) (to avoid a "dedicated" thread on the
mailing list)

Martin
Jokyo Images

Attachment: 0001-libavcodec-blockdsp-fix-comment.-clear_block-need-32.patch
Description: Binary data

Attachment: 0002-libavcodec-blockdsp-add-clear_block_prores.patch
Description: Binary data

Attachment: 0003-libavcodec-proresdec2-use-clear_blocks_prores-for-ea.patch
Description: Binary data

Attachment: 0004-libavcodec-blockdsp-cosmetic-indent.patch
Description: Binary data

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Reply via email to