On 3/18/2018 12:08 PM, Martin Vignali wrote: > 2018-03-03 18:20 GMT+01:00 Martin Vignali <martin.vign...@gmail.com>: > >> Hello, >> >> Patch in attach add SIMD for the 5 shuffle_bytes func for rgb2rgb >> The new SIMD are write using external ASM. >> >> Also add checkasm test for theses func >> Restricted to x86_64, because the scalar part doesn't compile on x86_32 >> >> I consider for the scalar part that the src_size value is a multiple of 4 >> (because the shuffle is for 4 bytes) >> >> Pass fate test on X86_64 and X86_32 (os 10.12) >> >> >> >> >> New patchs in attach : > - Now compile on x86_32 and x86_64 > - Add cosmetic patch to put all shuffle_bytes declaration in the same place > > Tested on X86_64 and X86_32 (os 10.12) > > Checkasm result : ./tests/checkasm/checkasm --test=sw_rgb --bench > > checkasm: using random seed 292997963 > MMX: > - sw_rgb.shuffle_bytes_2103 [OK] > MMXEXT: > - sw_rgb.shuffle_bytes_2103 [OK] > SSSE3: > - sw_rgb.shuffle_bytes_2103 [OK] > - sw_rgb.shuffle_bytes_0321 [OK] > - sw_rgb.shuffle_bytes_1230 [OK] > - sw_rgb.shuffle_bytes_3012 [OK] > - sw_rgb.shuffle_bytes_3210 [OK] > AVX2: > - sw_rgb.shuffle_bytes_2103 [OK] > - sw_rgb.shuffle_bytes_0321 [OK] > - sw_rgb.shuffle_bytes_1230 [OK] > - sw_rgb.shuffle_bytes_3012 [OK] > - sw_rgb.shuffle_bytes_3210 [OK] > checkasm: all 12 tests passed > shuffle_bytes_0321_c: 51.4 > shuffle_bytes_0321_ssse3: 18.7 > shuffle_bytes_0321_avx2: 12.7 > shuffle_bytes_1230_c: 126.9 > shuffle_bytes_1230_ssse3: 16.7 > shuffle_bytes_1230_avx2: 12.9 > shuffle_bytes_2103_c: 52.4 > shuffle_bytes_2103_mmx: 76.7 > shuffle_bytes_2103_mmxext: 197.2 > shuffle_bytes_2103_ssse3: 17.4 > shuffle_bytes_2103_avx2: 12.4 > shuffle_bytes_3012_c: 127.4 > shuffle_bytes_3012_ssse3: 14.7 > shuffle_bytes_3012_avx2: 12.4 > shuffle_bytes_3210_c: 127.4 > shuffle_bytes_3210_ssse3: 18.2 > shuffle_bytes_3210_avx2: 12.9
These AVX2 numbers are not worth it. Some CPU archs throttle down the frequency when using ymm instructions, so unless the function is considerably faster than the SSE* version then it's usually not worth adding. > > > Martin > > > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel