On Fri, 3 Sep 2021, Martin Storsjö wrote:
This gives rather big speedups on these functions:
Before:
put_h264_qpel_8_mc01_8_neon: 241.0 131.5 138.7
put_h264_qpel_8_mc02_8_neon: 214.7 121.2 127.5
put_h264_qpel_8_mc03_8_neon: 242.5 131.2 135.7
put_h264_qpel_8_mc11_8_neon: 421.2 218.7 251.0
put_h264_qpel_8_mc12_8_neon: 878.0 509.5 537.5
put_h264_qpel_8_mc13_8_neon: 423.7 217.0 252.0
put_h264_qpel_8_mc21_8_neon: 858.2 479.5 514.0
put_h264_qpel_8_mc22_8_neon: 649.7 385.2 403.0
put_h264_qpel_8_mc23_8_neon: 860.2 476.5 517.7
put_h264_qpel_8_mc31_8_neon: 437.2 219.5 252.5
put_h264_qpel_8_mc32_8_neon: 892.5 510.5 546.0
put_h264_qpel_8_mc33_8_neon: 438.2 218.5 257.0
put_h264_qpel_16_mc01_8_neon: 944.2 509.7 546.7
put_h264_qpel_16_mc02_8_neon: 878.7 469.5 509.7
put_h264_qpel_16_mc03_8_neon: 945.7 510.7 557.0
put_h264_qpel_16_mc11_8_neon: 1663.2 858.5 979.5
put_h264_qpel_16_mc12_8_neon: 3510.2 2027.7 2112.7
put_h264_qpel_16_mc13_8_neon: 1664.7 857.5 980.5
put_h264_qpel_16_mc21_8_neon: 3366.2 1928.5 2030.5
put_h264_qpel_16_mc22_8_neon: 2584.7 1514.7 1590.2
put_h264_qpel_16_mc23_8_neon: 3367.7 1927.7 2035.0
put_h264_qpel_16_mc31_8_neon: 1716.7 849.7 997.0
put_h264_qpel_16_mc32_8_neon: 3564.0 2044.2 3835.2
put_h264_qpel_16_mc33_8_neon: 1717.7 863.0 989.5
After:
put_h264_qpel_8_mc01_8_neon: 136.0 73.7 76.0
put_h264_qpel_8_mc02_8_neon: 108.7 65.0 64.0
put_h264_qpel_8_mc03_8_neon: 137.5 72.7 73.0
put_h264_qpel_8_mc11_8_neon: 316.2 159.0 188.5
put_h264_qpel_8_mc12_8_neon: 653.0 375.5 384.7
put_h264_qpel_8_mc13_8_neon: 318.7 165.5 189.5
put_h264_qpel_8_mc21_8_neon: 739.2 385.7 432.5
put_h264_qpel_8_mc22_8_neon: 530.7 295.5 309.5
put_h264_qpel_8_mc23_8_neon: 741.2 393.7 421.0
put_h264_qpel_8_mc31_8_neon: 332.2 162.5 190.0
put_h264_qpel_8_mc32_8_neon: 667.5 378.2 390.5
put_h264_qpel_8_mc33_8_neon: 332.7 166.5 195.5
put_h264_qpel_16_mc01_8_neon: 524.2 285.2 294.0
put_h264_qpel_16_mc02_8_neon: 454.7 252.2 250.2
put_h264_qpel_16_mc03_8_neon: 525.7 286.0 283.0
put_h264_qpel_16_mc11_8_neon: 1243.2 630.7 726.7
put_h264_qpel_16_mc12_8_neon: 2610.2 1479.7 1481.2
put_h264_qpel_16_mc13_8_neon: 1250.5 631.7 727.7
put_h264_qpel_16_mc21_8_neon: 2890.2 1571.2 1679.7
put_h264_qpel_16_mc22_8_neon: 2108.7 1177.5 1223.5
put_h264_qpel_16_mc23_8_neon: 2891.7 1578.7 1667.7
put_h264_qpel_16_mc31_8_neon: 1296.7 630.5 752.5
put_h264_qpel_16_mc32_8_neon: 2664.0 1483.2 1503.5
put_h264_qpel_16_mc33_8_neon: 1297.7 632.5 747.2
I.e. overall a 20%-60% reduction in runtime of these
functions.
---
libavcodec/aarch64/h264qpel_neon.S | 111 +++++++++++++++--------------
1 file changed, 56 insertions(+), 55 deletions(-)
Pushed.
// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".