[FFmpeg-devel] [PATCH] swr/resample: use fma when it is faster

Ganesh Ajjanagadde Sun, 13 Dec 2015 14:00:06 -0800

fma is a faster function on architectures supporting a native CPU instruction 
for it.
This may be tested by the ISO C optionally defined FP_FAST_FMA. Although
in the x86 lineup this came fairly late
(from Haswell onwards, and hence is absent unless appropriate -march is passed),
numerous other architectures support it: 
https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation.


Concretely, one can expect ~ 15-25% speedup that is of course heavily
architecture dependent.

This patch also ensures that as people migrate to newer CPU's, the
benefit will slowly trickle in.

I doubt this will cause build failures on broken libm's since I can't
imagine a platform where FP_FAST_FMA is defined but the function fma is
absent.

Sample benchmark (x86-64, Haswell, GNU/Linux under -march=native)

old:
515828458 decicycles in build_filter (loop 1000),    1024 runs,      0 skips

new (fma):
435866377 decicycles in build_filter (loop 1000),    1024 runs,      0 skips

Tested with FATE.

Signed-off-by: Ganesh Ajjanagadde <[email protected]>
---
 libswresample/resample.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/libswresample/resample.c b/libswresample/resample.c
index 34eb4c0..e61d4c5 100644
--- a/libswresample/resample.c
+++ b/libswresample/resample.c
@@ -33,8 +33,12 @@ static inline double eval_poly(const double *coeff, int 
size, double x) {
     double sum = coeff[size-1];
     int i;
     for (i = size-2; i >= 0; --i) {
+#ifdef FP_FAST_FMA
+        sum = fma(sum, x, coeff[i]);
+#else
         sum *= x;
         sum += coeff[i];
+#endif
     }
     return sum;
 }
-- 
2.6.4

_______________________________________________
ffmpeg-devel mailing list
[email protected]
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH] swr/resample: use fma when it is faster

Reply via email to