fma is a faster function on architectures supporting a native CPU instruction for it. This may be tested by the ISO C optionally defined FP_FAST_FMA. Although in the x86 lineup this came fairly late (from Haswell onwards, and hence is absent unless appropriate -march is passed), numerous other architectures support it: https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation.
Concretely, one can expect ~ 15-25% speedup that is of course heavily architecture dependent. This patch also ensures that as people migrate to newer CPU's, the benefit will slowly trickle in. I doubt this will cause build failures on broken libm's since I can't imagine a platform where FP_FAST_FMA is defined but the function fma is absent. Sample benchmark (x86-64, Haswell, GNU/Linux under -march=native) old: 515828458 decicycles in build_filter (loop 1000), 1024 runs, 0 skips new (fma): 435866377 decicycles in build_filter (loop 1000), 1024 runs, 0 skips Tested with FATE. Signed-off-by: Ganesh Ajjanagadde <gajjanaga...@gmail.com> --- libswresample/resample.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/libswresample/resample.c b/libswresample/resample.c index 34eb4c0..e61d4c5 100644 --- a/libswresample/resample.c +++ b/libswresample/resample.c @@ -33,8 +33,12 @@ static inline double eval_poly(const double *coeff, int size, double x) { double sum = coeff[size-1]; int i; for (i = size-2; i >= 0; --i) { +#ifdef FP_FAST_FMA + sum = fma(sum, x, coeff[i]); +#else sum *= x; sum += coeff[i]; +#endif } return sum; } -- 2.6.4 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel