Re: [FFmpeg-devel] [PATCH] SSE2 version of vf_idet's filter_line()

Reimar Döffinger Wed, 03 Sep 2014 00:16:54 -0700

On 03.09.2014, at 08:38, Pascal Massimino <[email protected]> wrote:
> On Tue, Sep 2, 2014 at 10:26 PM, Reimar Döffinger <[email protected]>
> wrote:
> 
>> On 03.09.2014, at 00:49, Pascal Massimino <[email protected]>
>> wrote:
>>> On Tue, Sep 2, 2014 at 9:39 AM, Michael Niedermayer <[email protected]>
>>> wrote:
>>> 
>>> 
>>> [ahem: ffmpeg doesn't feel like using intrinsics, by chance?]
>> 
>> I tried that about 5 months back, once more.
>> It still results in code that is slower than the plain C version, even
>> when using SIMD, on trivial NEON audio format conversion (same thing in asm
>> was about 8x faster).
>> So you can get the same effect with less effort by disabling just
>> disabling asm code.
>> 
> 
> strange. I exclusively used intrinsics for libwebp (x86, but also
> neon/aarch64) and was pretty
> pleased with the result (say <2% perf loss, but 10x easier maintenance and
> friendliness to non-guru contributors).


I guess you never used uint16x8x2 and similar types then, because almost any 
access to them seems to go via the stack.
See the last file of 
http://lists-archives.com/mplayer-dev-eng/38036-add-neon-optimizations-to-some-critical-audio-functions.html
 , it spilled the data to stack twice per loop iteration.
_______________________________________________
ffmpeg-devel mailing list
[email protected]
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] SSE2 version of vf_idet's filter_line()

Reply via email to