Re: [FFmpeg-devel] [aarch64] improve performance of ff_hscale_8_to_15_neon

Sebastian Pop Wed, 27 Nov 2019 12:29:11 -0800

On Wed, Nov 27, 2019 at 2:13 PM Clément Bœsch <[email protected]> wrote:
> Yeah I will by the end of the week. I wrote that a few years ago so I need
> to take some time to get back in the context.


Thanks Clément for your help.

>
> BTW, that's quite a huge speed improvement you're bringing in, are you
> sure you are always allowed to read up to filter[3]?

I will check.
Otherwise we can version the code and keep the existing code along for
vector factor 2.

>
> Last thing: this same optimization was also written for arm following the
> same pattern. You may want to adjust that one as well while waiting for my
> review :)

Thanks for pointing it out.  I can submit a separate patch for that.

I have also seen that ff_yuv2planeX_8_neon in libswscale/aarch64/output.S
could be improved in a similar way, and that function appears
on the critical path (for multi threaded encodes) and on the
linux-perf profiles.

Sebastian
_______________________________________________
ffmpeg-devel mailing list
[email protected]
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
[email protected] with subject "unsubscribe".

Re: [FFmpeg-devel] [aarch64] improve performance of ff_hscale_8_to_15_neon

Reply via email to