Re: [FFmpeg-devel] [PATCH] lavc/aarch64: add some neon pix_abs functions

2022-03-14 Thread Martin Storsjö
On Mon, 7 Mar 2022, Pop, Sebastian wrote: Here are a few suggestions: +add d18, d17, d18 // add to the end result register [...] +mov w0, v18.S[0]// copy result to general purpose register I think you can use 32-bit register s18 instead

Re: [FFmpeg-devel] [PATCH] lavc/aarch64: add some neon pix_abs functions

2022-03-14 Thread Martin Storsjö
On Mon, 7 Mar 2022, Swinney, Jonathan wrote: - ff_pix_abs16_neon - ff_pix_abs16_xy2_neon In direct micro benchmarks of these ff functions verses their C implementations, these functions performed as follows on AWS Graviton 2: ff_pix_abs16_neon: c: benchmark ran 10 iterations in 0.955383 s