On Wed, Jul 6, 2016 at 4:37 AM, Dan Parrot <dan.par...@mail.com> wrote: > Finish providing SIMD versions for POWER8 VSX of functions in > libswscale/input.c That should allow trac ticket #5570 to be closed. > The speedups obtained for the functions are: > > abgrToA_c 1.19 > bgr24ToUV_c 1.23 > bgr24ToUV_half_c 1.37 > bgr24ToY_c_vsx 1.43 > nv12ToUV_c 1.05 > nv21ToUV_c 1.06 > planar_rgb_to_uv 1.25 > planar_rgb_to_y 1.26 > rgb24ToUV_c 1.11 > rgb24ToUV_half_c 1.10 > rgb24ToY_c 0.92 > rgbaToA_c 0.88 > uyvyToUV_c 1.05 > uyvyToY_c 1.15 > yuy2ToUV_c 1.07 > yuy2ToY_c 1.17 > yvy2ToUV_c 1.05
SIMD implementations that in the best case improve the speed by 43% (and in some cases is *slower*) seem barely worth it. One would expect a proper SIMD implementation to offer 100% or higher increases, at least thats the general expectation on x86 with SSE/AVX. So the question here is - is thats VSX being bad, or the intrinsics being bad? How would the speedup be in proper hand-written ASM? If hand-written ASM can give us the usual 100-200% improvements we would expect from SIMD, then this is what should generally be favored. Also, one further thought: From the commit message, it sounds like you might only be doing this for the bounty in #5570, do you plan to maintain these optimizations in the future? - Hendrik _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel