On Sun, Dec 13, 2015 at 5:47 PM, Ronald S. Bultje <rsbul...@gmail.com> wrote: > Hi, > > On Sun, Dec 13, 2015 at 4:59 PM, Ganesh Ajjanagadde <gajjanaga...@gmail.com> > wrote: >> >> fma is a faster function on architectures supporting a native CPU >> instruction for it. >> This may be tested by the ISO C optionally defined FP_FAST_FMA. Although >> in the x86 lineup this came fairly late >> (from Haswell onwards, and hence is absent unless appropriate -march is >> passed), >> numerous other architectures support it: >> https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation. >> >> Concretely, one can expect ~ 15-25% speedup that is of course heavily >> architecture dependent. >> >> This patch also ensures that as people migrate to newer CPU's, the >> benefit will slowly trickle in. >> >> I doubt this will cause build failures on broken libm's since I can't >> imagine a platform where FP_FAST_FMA is defined but the function fma is >> absent. >> >> Sample benchmark (x86-64, Haswell, GNU/Linux under -march=native) >> >> old: >> 515828458 decicycles in build_filter (loop 1000), 1024 runs, 0 >> skips >> >> new (fma): >> 435866377 decicycles in build_filter (loop 1000), 1024 runs, 0 >> skips >> >> Tested with FATE. >> >> Signed-off-by: Ganesh Ajjanagadde <gajjanaga...@gmail.com> >> --- >> libswresample/resample.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/libswresample/resample.c b/libswresample/resample.c >> index 34eb4c0..e61d4c5 100644 >> --- a/libswresample/resample.c >> +++ b/libswresample/resample.c >> @@ -33,8 +33,12 @@ static inline double eval_poly(const double *coeff, int >> size, double x) { >> double sum = coeff[size-1]; >> int i; >> for (i = size-2; i >= 0; --i) { >> +#ifdef FP_FAST_FMA >> + sum = fma(sum, x, coeff[i]); >> +#else >> sum *= x; >> sum += coeff[i]; >> +#endif >> } >> return sum; >> } >> -- >> 2.6.4 > > > Nope, this is not how we do CPU-specific optimizations. Check example > implementations in libswresample/x86/*.asm and the related init functions > plus macros to check for runtime cpu support in libswresample/x86/*_init.c. > You want to follow that pattern.
No, this is not x86 specific. This is generic code. If I did such a maneouver, benefits would apply only to x86, an inferior outcome. > > Ronald _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel