hscale is bound by the number of multiply-adds available on a given core.
The attached patch doubles the number of multiply-adds by distributing half
the load to a helper thread.

The performance improves up to 50% on Graviton2 Arm Neoverse-N1 processors.

$ ./ffmpeg_g -nostats -f lavfi -i testsrc2=4k:d=2 -vf
bench=start,scale=1024x1024,bench=stop -f null -
before: [bench @ 0xaaaad62c3d30] t:0.013293 avg:0.013315 max:0.013697
min:0.013293
after:  [bench @ 0xaaaae9346d30] t:0.009637 avg:0.009691 max:0.010005
min:0.009637
38% improvement

scale=1280x720  49% improvement
before: [bench @ 0xaaaadba88d30] t:0.015973 avg:0.016321 max:0.016917
min:0.015973
after:  [bench @ 0xaaaabc78dd30] t:0.010823 avg:0.010869 max:0.011552
min:0.010708

scale=852x480  45% improvement
before: [bench @ 0xaaaaeeed0d30] t:0.013731 avg:0.013727 max:0.013773
min:0.013279
after:  [bench @ 0xaaaaf5f5dd30] t:0.009279 avg:0.009296 max:0.009328
min:0.009187

scale=640x360  45% improvement
before: [bench @ 0xaaaacee25d30] t:0.012010 avg:0.012006 max:0.012053
min:0.011653
after:  [bench @ 0xaaaaea2b5d30] t:0.008077 avg:0.008084 max:0.008409
min:0.008057

scale=284x160  36% improvement
before: [bench @ 0xaaaadbb9ed30] t:0.008384 avg:0.008367 max:0.008421
min:0.008193
after:  [bench @ 0xaaaafb1d6d30] t:0.006099 avg:0.006100 max:0.006120
min:0.006026

Attachment: 0001-aarch64-improve-hscale-by-50-with-multi-threading.patch
Description: Binary data

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to