Reimar,
On Wed, Sep 3, 2014 at 9:16 AM, Reimar Döffinger <reimar.doeffin...@gmx.de> wrote: > On 03.09.2014, at 08:38, Pascal Massimino <pascal.massim...@gmail.com> > wrote: > > On Tue, Sep 2, 2014 at 10:26 PM, Reimar Döffinger < > reimar.doeffin...@gmx.de> > > wrote: > > > >> On 03.09.2014, at 00:49, Pascal Massimino <pascal.massim...@gmail.com> > >> wrote: > >>> On Tue, Sep 2, 2014 at 9:39 AM, Michael Niedermayer <michae...@gmx.at> > >>> wrote: > >>> > >>> > >>> [ahem: ffmpeg doesn't feel like using intrinsics, by chance?] > >> > >> I tried that about 5 months back, once more. > >> It still results in code that is slower than the plain C version, even > >> when using SIMD, on trivial NEON audio format conversion (same thing in > asm > >> was about 8x faster). > >> So you can get the same effect with less effort by disabling just > >> disabling asm code. > >> > > > > strange. I exclusively used intrinsics for libwebp (x86, but also > > neon/aarch64) and was pretty > > pleased with the result (say <2% perf loss, but 10x easier maintenance > and > > friendliness to non-guru contributors). > > I guess you never used uint16x8x2 and similar types then, because almost > any access to them seems to go via the stack. > See the last file of > http://lists-archives.com/mplayer-dev-eng/38036-add-neon-optimizations-to-some-critical-audio-functions.html > , it spilled the data to stack twice per loop iteration. > indeed, i just tried to compile the patch (gcc 4.8.3) and the output is rather bad. It's likely related to the poor support of post-incremented instructions. I've noticed that in several occasions. But on the bright side, things seems to be moving in the right direction, e.g.: https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00122.html /skal _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel