On Sat, Jan 04, 2020 at 05:53:34PM +0100, Clément Bœsch wrote: > On Tue, Dec 10, 2019 at 04:38:25PM -0600, Sebastian Pop wrote: > > Hi, > > > > This patch rewrites the innermost loop of ff_yuv2planeX_8_neon to avoid > > zips and > > horizontal adds by using fused multiply adds. The patch also uses ld1r to > > load > > one element and replicate it across all lanes of the vector. The patch also > > improves the clipping code by removing the shift right instructions and > > performing the shift with the shift-right narrow instructions. > > > > I see 8% better performance on an m6g instance with neoverse-n1 CPUs: > > $ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf > > bench=start,scale=1024x1024,bench=stop -f null - > > before: t:0.014015 avg:0.014096 max:0.015018 min:0.013971 > > after: t:0.012985 avg:0.013013 max:0.013996 min:0.012818 > > > > Tested with `make check` on aarch64-linux. > > > > Please let me know how I can improve the patch. > > > > Looks nice. I can't test currently but LGTM.
will apply thx [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety -- Benjamin Franklin
signature.asc
Description: PGP signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".