On 03.09.2014, at 08:38, Pascal Massimino <pascal.massim...@gmail.com> wrote: > On Tue, Sep 2, 2014 at 10:26 PM, Reimar Döffinger <reimar.doeffin...@gmx.de> > wrote: > >> On 03.09.2014, at 00:49, Pascal Massimino <pascal.massim...@gmail.com> >> wrote: >>> On Tue, Sep 2, 2014 at 9:39 AM, Michael Niedermayer <michae...@gmx.at> >>> wrote: >>> >>> >>> [ahem: ffmpeg doesn't feel like using intrinsics, by chance?] >> >> I tried that about 5 months back, once more. >> It still results in code that is slower than the plain C version, even >> when using SIMD, on trivial NEON audio format conversion (same thing in asm >> was about 8x faster). >> So you can get the same effect with less effort by disabling just >> disabling asm code. >> > > strange. I exclusively used intrinsics for libwebp (x86, but also > neon/aarch64) and was pretty > pleased with the result (say <2% perf loss, but 10x easier maintenance and > friendliness to non-guru contributors).
I guess you never used uint16x8x2 and similar types then, because almost any access to them seems to go via the stack. See the last file of http://lists-archives.com/mplayer-dev-eng/38036-add-neon-optimizations-to-some-critical-audio-functions.html , it spilled the data to stack twice per loop iteration. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel