On Fri, 15 Apr 2022, Swinney, Jonathan wrote:

This commit adds new code paths for vscale when filterSize is 2, 4, or 8. By
using specialized code with unrolling to match the filterSize we can improve
performance.

| (seconds)   | c6g   |       |       |
| ------------| ----- | ----- | ----- |
| filterSize  | 2     | 4     | 8     |
| original    | 0.581 | 0.974 | 1.744 |
| optimized   | 0.399 | 0.569 | 1.052 |
| improvement | 31.1% | 41.6% | 39.7% |

Signed-off-by: Jonathan Swinney <jswin...@amazon.com>
---
libswscale/aarch64/output.S  | 147 +++++++++++++++++++++++++++++++++--
libswscale/aarch64/swscale.c |  12 +++
2 files changed, 153 insertions(+), 6 deletions(-)

I'll have a closer look at the assembly itself at a later time, but first:

The checkasm tests in tests/checkasm/sw_scale.c does test yuv2planeX, but there's no testing of yuv2plane1, can you extend it to cover that too? And that existing test only tests filter sizes 1, 4, 8, 16, but apparently should be extended to test size 2 too?

// Martin

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to