hscale is bound by the number of multiply-adds available on a given core. The attached patch doubles the number of multiply-adds by distributing half the load to a helper thread.
The performance improves up to 50% on Graviton2 Arm Neoverse-N1 processors. $ ./ffmpeg_g -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null - before: [bench @ 0xaaaad62c3d30] t:0.013293 avg:0.013315 max:0.013697 min:0.013293 after: [bench @ 0xaaaae9346d30] t:0.009637 avg:0.009691 max:0.010005 min:0.009637 38% improvement scale=1280x720 49% improvement before: [bench @ 0xaaaadba88d30] t:0.015973 avg:0.016321 max:0.016917 min:0.015973 after: [bench @ 0xaaaabc78dd30] t:0.010823 avg:0.010869 max:0.011552 min:0.010708 scale=852x480 45% improvement before: [bench @ 0xaaaaeeed0d30] t:0.013731 avg:0.013727 max:0.013773 min:0.013279 after: [bench @ 0xaaaaf5f5dd30] t:0.009279 avg:0.009296 max:0.009328 min:0.009187 scale=640x360 45% improvement before: [bench @ 0xaaaacee25d30] t:0.012010 avg:0.012006 max:0.012053 min:0.011653 after: [bench @ 0xaaaaea2b5d30] t:0.008077 avg:0.008084 max:0.008409 min:0.008057 scale=284x160 36% improvement before: [bench @ 0xaaaadbb9ed30] t:0.008384 avg:0.008367 max:0.008421 min:0.008193 after: [bench @ 0xaaaafb1d6d30] t:0.006099 avg:0.006100 max:0.006120 min:0.006026
0001-aarch64-improve-hscale-by-50-with-multi-threading.patch
Description: Binary data
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".