On Fri, Jul 11, 2014 at 11:32:08AM +0100, Ben Avison wrote: > The previous implementation targeted DTS Coherent Acoustics, which only > requires nbits == 4 (fft16()). This case was (and still is) linked directly > rather than being indirected through ff_fft_calc_vfp(), but now the full > range from radix-4 up to radix-65536 is available. This benefits other codecs > such as AAC and AC3. > > The implementaion is based upon the C version, with each routine larger than > radix-16 calling a hierarchy of smaller FFT functions, then performing a > post-processing pass. This pass benefits a lot from loop unrolling to > counter the long pipelines in the VFP. A relaxed calling standard also > reduces the overhead of the call hierarchy, and avoiding the excessive > inlining performed by GCC probably helps with I-cache utilisation too. > > I benchmarked the result by measuring the number of gperftools samples that > hit anywhere in the AAC decoder (starting from aac_decode_frame()) or > specifically in the FFT routines (fft4() to fft512() and pass()) for the > same sample AAC stream: > > Before After > Mean StdDev Mean StdDev Confidence Change > Audio decode 2245.5 53.1 1599.6 43.8 100.0% +40.4% > FFT routines 940.6 22.0 348.1 20.8 100.0% +170.2% > --- > libavcodec/arm/fft_init_arm.c | 8 +- > libavcodec/arm/fft_vfp.S | 284 > +++++++++++++++++++++++++++++++++++++++-- > 2 files changed, 275 insertions(+), 17 deletions(-)
merged a variant of this patch [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Complexity theory is the science of finding the exact solution to an approximation. Benchmarking OTOH is finding an approximation of the exact
signature.asc
Description: Digital signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel