On Wed, 21 Feb 2024, J. Dekker wrote:
Benched using single-threaded full decode on an Ampere Altra.
Bpp Before After Speedup
8 73,3s 65,2s 1.124x
10 114,2s 104,0s 1.098x
12 125,8s 115,7s 1.087x
Signed-off-by: J. Dekker <j...@itanimul.li>
---
libavcodec/aarch64/hevcdsp_deblock_neon.S | 421 ++++++++++++++++++++++
libavcodec/aarch64/hevcdsp_init_aarch64.c | 18 +
2 files changed, 439 insertions(+)
+0: // STRONG FILTER
+
+ // P0 = p0 + av_clip(((p2 + 2 * p1 + 2 * p0 + 2 * q0 + q1 + 4) >> 3) -
p0, -tc3, tc3);
+ add v21.8h, v2.8h, v3.8h // (p1 + p0
+ add v21.8h, v4.8h, v21.8h // + q0)
+ shl v21.8h, v21.8h, #1 // * 2
+ add v22.8h, v1.8h, v5.8h // (p2 + q1)
+ add v21.8h, v22.8h, v21.8h // +
+ srshr v21.8h, v21.8h, #3 // >> 3
+ sub v21.8h, v21.8h, v3.8h // - p0
+
The srshr line is incorrectly indented here (and elsewhere)
+ sqxtun v4.8b, v4.8h
+ sqxtun v5.8b, v5.8h
+ sqxtun v6.8b, v6.8h
+ sqxtun v7.8b, v7.8h
+.endif
+ ret
+3: ret x6
Please indent the "x6" here like other operands
+.macro hevc_loop_filter_luma dir bitdepth
+function ff_hevc_\dir\()_loop_filter_luma_\bitdepth\()_neon, export=1
+ mov x6, x30
+.if \dir == v
In GAS assembler, .if does a numerical comparison - it can't do string
comparisons.
The right way to do this is to do ".ifc \dir, v", which does a string
comparison.
(If you really do need to do this like a numerical comparison, it's
possible to define e.g. "v" as a numeric symbol as well, see e.g.
https://code.videolan.org/videolan/dav1d/-/merge_requests/1603/diffs?commit_id=d4746c908c56cb2e8545efd348b8cdc13f2f2253
but that's not really the nicest way to do it.)
This issue breaks compilation with Clang. With gas-preprocessor (for
MSVC), it manages to build correctly, but does the wrong thing.
To avoid me having to test all these build configurations manually,
remembering to check all these corner case build configurations and check
indentation and all, I've set up a PoC for testing such things on Github
Actions.
If you have a repo on github, grab my commits from
https://github.com/mstorsjo/FFmpeg/commits/gha-aarch64 (there are a couple
of them), add your changes on top of these, and push it as a branch to
your own github repo, then check the output from the actions.
Here's the output of a run with the patches you just posted:
https://github.com/mstorsjo/FFmpeg/actions/runs/7988312683
// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".