On 3/18/2018 1:23 PM, Martin Vignali wrote: > 2018-03-18 16:49 GMT+01:00 James Almer <jamr...@gmail.com>: > >> On 3/18/2018 12:08 PM, Martin Vignali wrote: >>> 2018-03-03 18:20 GMT+01:00 Martin Vignali <martin.vign...@gmail.com>: >>> >>>> Hello, >>>> >>>> Patch in attach add SIMD for the 5 shuffle_bytes func for rgb2rgb >>>> The new SIMD are write using external ASM. >>>> >>>> Also add checkasm test for theses func >>>> Restricted to x86_64, because the scalar part doesn't compile on x86_32 >>>> >>>> I consider for the scalar part that the src_size value is a multiple of >> 4 >>>> (because the shuffle is for 4 bytes) >>>> >>>> Pass fate test on X86_64 and X86_32 (os 10.12) >>>> >>>> >>>> >>>> >>>> New patchs in attach : >>> - Now compile on x86_32 and x86_64 >>> - Add cosmetic patch to put all shuffle_bytes declaration in the same >> place >>> >>> Tested on X86_64 and X86_32 (os 10.12) >>> >>> Checkasm result : ./tests/checkasm/checkasm --test=sw_rgb --bench >>> >>> checkasm: using random seed 292997963 >>> MMX: >>> - sw_rgb.shuffle_bytes_2103 [OK] >>> MMXEXT: >>> - sw_rgb.shuffle_bytes_2103 [OK] >>> SSSE3: >>> - sw_rgb.shuffle_bytes_2103 [OK] >>> - sw_rgb.shuffle_bytes_0321 [OK] >>> - sw_rgb.shuffle_bytes_1230 [OK] >>> - sw_rgb.shuffle_bytes_3012 [OK] >>> - sw_rgb.shuffle_bytes_3210 [OK] >>> AVX2: >>> - sw_rgb.shuffle_bytes_2103 [OK] >>> - sw_rgb.shuffle_bytes_0321 [OK] >>> - sw_rgb.shuffle_bytes_1230 [OK] >>> - sw_rgb.shuffle_bytes_3012 [OK] >>> - sw_rgb.shuffle_bytes_3210 [OK] >>> checkasm: all 12 tests passed >>> shuffle_bytes_0321_c: 51.4 >>> shuffle_bytes_0321_ssse3: 18.7 >>> shuffle_bytes_0321_avx2: 12.7 >>> shuffle_bytes_1230_c: 126.9 >>> shuffle_bytes_1230_ssse3: 16.7 >>> shuffle_bytes_1230_avx2: 12.9 >>> shuffle_bytes_2103_c: 52.4 >>> shuffle_bytes_2103_mmx: 76.7 >>> shuffle_bytes_2103_mmxext: 197.2 >>> shuffle_bytes_2103_ssse3: 17.4 >>> shuffle_bytes_2103_avx2: 12.4 >>> shuffle_bytes_3012_c: 127.4 >>> shuffle_bytes_3012_ssse3: 14.7 >>> shuffle_bytes_3012_avx2: 12.4 >>> shuffle_bytes_3210_c: 127.4 >>> shuffle_bytes_3210_ssse3: 18.2 >>> shuffle_bytes_3210_avx2: 12.9 >> >> These AVX2 numbers are not worth it. Some CPU archs throttle down the >> frequency when using ymm instructions, so unless the function is >> considerably faster than the SSE* version then it's usually not worth >> adding. >> >> > I run the test again with a bigger width (512 instead of 128) > This is my result : > shuffle_bytes_0321_c: 128.6 > shuffle_bytes_0321_ssse3: 41.6 > shuffle_bytes_0321_avx2: 23.4 > shuffle_bytes_1230_c: 626.4 > shuffle_bytes_1230_ssse3: 41.6 > shuffle_bytes_1230_avx2: 23.9 > shuffle_bytes_2103_c: 128.4 > shuffle_bytes_2103_mmx: 307.1 > shuffle_bytes_2103_mmxext: 224.6 > shuffle_bytes_2103_ssse3: 72.9 > shuffle_bytes_2103_avx2: 32.9 > shuffle_bytes_3012_c: 620.9 > shuffle_bytes_3012_ssse3: 40.6 > shuffle_bytes_3012_avx2: 36.1 > shuffle_bytes_3210_c: 602.6 > shuffle_bytes_3210_ssse3: 75.4 > shuffle_bytes_3210_avx2: 33.6 > > > So except for the 3012 version (don't know why), we are around x2 in AVX2. > Do you still think, it's need to remove AVX2 version ? > > > Martin
No, those look good now. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel