Re: [FFmpeg-devel] [aarch64] improve performance of ff_yuv2planeX_8_neon

2020-01-04 Thread Michael Niedermayer
On Sat, Jan 04, 2020 at 05:53:34PM +0100, Clément Bœsch wrote: > On Tue, Dec 10, 2019 at 04:38:25PM -0600, Sebastian Pop wrote: > > Hi, > > > > This patch rewrites the innermost loop of ff_yuv2planeX_8_neon to avoid > > zips and > > horizontal adds by using fused multiply adds. The patch also use

Re: [FFmpeg-devel] [aarch64] improve performance of ff_yuv2planeX_8_neon

2020-01-04 Thread Clément Bœsch
On Tue, Dec 10, 2019 at 04:38:25PM -0600, Sebastian Pop wrote: > Hi, > > This patch rewrites the innermost loop of ff_yuv2planeX_8_neon to avoid zips > and > horizontal adds by using fused multiply adds. The patch also uses ld1r to load > one element and replicate it across all lanes of the vecto

Re: [FFmpeg-devel] [aarch64] improve performance of ff_yuv2planeX_8_neon

2019-12-25 Thread Sebastian Pop
On Mon, Dec 16, 2019 at 3:56 PM Jean-Baptiste Kempf wrote: > > On Tue, Dec 10, 2019, at 23:38, Sebastian Pop wrote: >> Please let me know how I can improve the patch. > > No remarks from me. > Clément, any further feedback to improve the patch? Ok to commit? Thanks, Sebastian ___

Re: [FFmpeg-devel] [aarch64] improve performance of ff_yuv2planeX_8_neon

2019-12-16 Thread Jean-Baptiste Kempf
On Tue, Dec 10, 2019, at 23:38, Sebastian Pop wrote: > Please let me know how I can improve the patch. No remarks from me. -- Jean-Baptiste Kempf - President +33 672 704 734 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mail

[FFmpeg-devel] [aarch64] improve performance of ff_yuv2planeX_8_neon

2019-12-10 Thread Sebastian Pop
Hi, This patch rewrites the innermost loop of ff_yuv2planeX_8_neon to avoid zips and horizontal adds by using fused multiply adds. The patch also uses ld1r to load one element and replicate it across all lanes of the vector. The patch also improves the clipping code by removing the shift right ins