On 10/1/2017 8:46 PM, Martin Vignali wrote: > Hello, > > After taking a look on blockdsp > ./tests/checkasm/checkasm --test=blockdsp --bench > > the result of clear_blocks is slower on my computer than the C version > except if we add an avx version > > In attach patch to add avx version > for clear_block and clear_blocks > > result : (Kaby Lake, Mac os 10.12) > checkasm: all 6 tests passed > blockdsp.clear_block_c: 15.9 > blockdsp.clear_block_mmx: 16.4 > blockdsp.clear_block_sse: 7.4 > blockdsp.clear_block_avx: 3.9 > > blockdsp.clear_blocks_c: 29.6 > blockdsp.clear_blocks_mmx: 99.1 > blockdsp.clear_blocks_sse: 48.4 > blockdsp.clear_blocks_avx: 24.4
On a Haswell Windows x64 I get benchmarking with native FFmpeg timers blockdsp.clear_block_c: 28.0 blockdsp.clear_block_mmx: 14.0 blockdsp.clear_block_sse: 6.0 blockdsp.clear_block_avx: 4.0 blockdsp.clear_blocks_c: 77.0 blockdsp.clear_blocks_mmx: 94.0 blockdsp.clear_blocks_sse: 46.0 blockdsp.clear_blocks_avx: 23.0 I used GCC 7.2. clear_blocks_mmx is slower than c for me as well, but not the rest. Your compiler seems to have done a much better job than mine. Is it Clang? Does it somehow have vectorization enabled perhaps? Because that's not supposed to happen. > > I also modify several decoder/encoder, in order to fix the DECLARE_ALIGNED > from 16 to 32 > > I run make fate SAMPLES=fate-suite/ > i have several errors, but after a check, these errors > doesn't seems to be related to this patch Make sure to clean your build folder if you recently pulled new commits from the git repository. Reconfigure if necessary. > > Martin > > > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel