On 2018-07-19 17:23, Rostislav Pehlivanov wrote: > Could you provide standard overall transform results using START/STOP_TIMER > rather than overall decoding speed?
Ask and ye shall receive. > haar horizontal compose > sse2: 3.67x faster (45248±108.1 vs. 12328±21.1 decicycles) compared with > none > avx: 3.74x faster (45248±108.1 vs. 12091±11.0 decicycles) compared with > none > avx2: 5.14x faster (45248±108.1 vs. 8805±15.6 decicycles) compared with > none > haar vertical compose > sse2: 1.57x faster (31771±459.9 vs. 20179±786.2 decicycles) compared with > none > avx: 1.62x faster (31771±459.9 vs. 19572±253.1 decicycles) compared with > none > avx2: 1.73x faster (31771±459.9 vs. 18337±827.9 decicycles) compared with > none > > legall vertical hi > sse2: 3.68x faster (20506±46.2 vs. 5574±29.7 decicycles) compared with > none > avx2: 5.96x faster (20506±46.2 vs. 3442±32.7 decicycles) compared with > none > legall vertical lo > sse2: 1.52x faster (28360±178.6 vs. 18603±114.8 decicycles) compared with > none > avx2: 1.64x faster (28360±178.6 vs. 17255±372.3 decicycles) compared with > none > > dd97 vertical hi > sse2: 2.76x faster (31975±103.0 vs. 11570±247.5 decicycles) compared with > none > avx: 2.82x faster (31975±103.0 vs. 11346±179.0 decicycles) compared with > none > avx2: 3.83x faster (31975±103.0 vs. 8357±219.6 decicycles) compared with > none > dd97 vertical lo > sse2: 1.52x faster (29476±335.8 vs. 19429±518.7 decicycles) compared with > none > avx2: 1.62x faster (29476±335.8 vs. 18246±559.8 decicycles) compared with > none Here "none" refers to the C functions, from "-cpuflags none" option. I also have the results of removing the C wrappers from these functions, except dd97. They aren't that much better. > haar horizontal compose > sse2: 3.68x faster (45143±36.4 vs. 12279±16.4 decicycles) compared with > none > avx: 3.68x faster (45143±36.4 vs. 12275±9.2 decicycles) compared with > none > avx2: 5.16x faster (45143±36.4 vs. 8742±12.3 decicycles) compared with > none > haar vertical compose > sse2: 1.64x faster (31792±367.5 vs. 19377±271.7 decicycles) compared with > none > avx: 1.58x faster (31792±367.5 vs. 20090±593.9 decicycles) compared with > none > avx2: 1.66x faster (31792±367.5 vs. 19157±1352.4 decicycles) compared > with none > > legall vertical hi > sse2: 3.86x faster (20201±26.5 vs. 5231±39.0 decicycles) compared with > none > avx2: 6.70x faster (20201±26.5 vs. 3014±39.1 decicycles) compared with > none > legall vertical lo > sse2: 1.50x faster (28345±206.6 vs. 18908±440.3 decicycles) compared with > none > avx2: 1.63x faster (28345±206.6 vs. 17361±637.9 decicycles) compared with > none I will squash patches, update commit messages, and send a new patch thread. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel