Re: [FFmpeg-devel] libavcodec/blockdsp : add clear_blocks_prores func (SSE, AVX) for prores decoding

Hendrik Leppkes Thu, 05 Oct 2017 09:12:28 -0700

On Thu, Oct 5, 2017 at 4:58 PM, Martin Vignali <[email protected]> wrote:
> Hello,
>
> In attach patchs to add a dedicated func for clear_block inside
> prores decoding (proresdec2)
>
> currently slice decode func use a loop and call the blockdsp.clear_block
> func
>
> After some test, it seems to be slower, than memset (for me)
> I check using this "fake" func in the blockdsp
> static void ff_clear_blocks_prores_sse_loop(int16_t * blocks, ptrdiff_t
> block_count){
>     int i;
>     for (i = 0; i < block_count; i++)
>         ff_clear_block_sse(blocks+(i<<6));
> }
>
> static void ff_clear_blocks_prores_avx_loop(int16_t * blocks, ptrdiff_t
> block_count){
>     int i;
>     for (i = 0; i < block_count; i++)
>         ff_clear_block_avx(blocks+(i<<6));
> }
>
> the result in checkasm are (need patch in attach to reproduce the test) :
> using the loop
> blockdsp.clear_blocks_prores_c: 137.8
> blockdsp.clear_blocks_prores_sse: 292.0
> blockdsp.clear_blocks_prores_avx: 230.5
>
>
> Using the new asm func this is the result (Kaby Lake, os 10.12, Clang 8.1)
> blockdsp.clear_blocks_prores_c: 153.4
> blockdsp.clear_blocks_prores_sse: 284.4
> blockdsp.clear_blocks_prores_avx: 142.2
>
>


This is still slower then the memset numbers from the first test, why
the high variation in there?

- Hendrik
_______________________________________________
ffmpeg-devel mailing list
[email protected]
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] libavcodec/blockdsp : add clear_blocks_prores func (SSE, AVX) for prores decoding

Reply via email to