On 11/10/2017 10:28 AM, James Darnley wrote: > On 2017-11-09 20:35, Martin Vignali wrote: >> 2017-11-09 12:58 GMT+01:00 James Darnley <jdarn...@obe.tv>: >> >>> From: James Darnley <james.darn...@gmail.com> >>> >>> Also adjust alignment requirements where nessecary. >>> --- >>> Whether this patch is committed or not the change to 4xm.c should be >>> picked to >>> master because the alignment is wrong for the AVX version of this >>> function. I >>> assume it hasn't been noticed yet because it manages to be 32-byte aligned >>> without intervention. >>> >>> >> Thanks for fixing, the 4xm, i miss it in the avx patch >> >> Just by curiosity : can you post the checkasm result (i can't test AVX512) ? > > I certainly can. > >> $ ./tests/checkasm/checkasm --bench --test=blockdsp >> benchmarking with native FFmpeg timers >> nop: 26.0 >> checkasm: using random seed 402373647 >> MMX: >> - blockdsp.blockdsp [OK] >> SSE: >> - blockdsp.blockdsp [OK] >> AVX: >> - blockdsp.blockdsp [OK] >> AVX-512: >> - blockdsp.blockdsp [OK] >> checkasm: all 8 tests passed >> blockdsp.clear_block_c: 23.5 >> blockdsp.clear_block_mmx: 11.5 >> blockdsp.clear_block_sse: 5.5 >> blockdsp.clear_block_avx: 3.0 >> blockdsp.clear_block_avx512: 5.0
This sounds like it's not worth adding. >> blockdsp.clear_blocks_c: 48.0 >> blockdsp.clear_blocks_mmx: 77.0 >> blockdsp.clear_blocks_sse: 38.0 >> blockdsp.clear_blocks_avx: 18.5 >> blockdsp.clear_blocks_avx512: 11.0 This one is better, but a perf run to check how much CPU time is spent in this function is needed, because I'm not sure it's important enough to justify having the CPU throttled just to run avx512 code... _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel