On Sat, Jan 04, 2020 at 05:53:34PM +0100, Clément Bœsch wrote:
> On Tue, Dec 10, 2019 at 04:38:25PM -0600, Sebastian Pop wrote:
> > Hi,
> > 
> > This patch rewrites the innermost loop of ff_yuv2planeX_8_neon to avoid 
> > zips and
> > horizontal adds by using fused multiply adds. The patch also uses ld1r to 
> > load
> > one element and replicate it across all lanes of the vector. The patch also
> > improves the clipping code by removing the shift right instructions and
> > performing the shift with the shift-right narrow instructions.
> > 
> > I see 8% better performance on an m6g instance with neoverse-n1 CPUs:
> > $ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf
> > bench=start,scale=1024x1024,bench=stop -f null -
> > before: t:0.014015 avg:0.014096 max:0.015018 min:0.013971
> > after:  t:0.012985 avg:0.013013 max:0.013996 min:0.012818
> > 
> > Tested with `make check` on aarch64-linux.
> > 
> > Please let me know how I can improve the patch.
> > 
> 
> Looks nice. I can't test currently but LGTM.

will apply

thx

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety -- Benjamin Franklin

Attachment: signature.asc
Description: PGP signature

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to