aarch64: new optimization for 8-bit hevc_epel_uni_v

Martin Storsjö Tue, 26 Sep 2023 05:53:18 -0700

Hi,

Thanks, this looks mostly ok now.

There were a few minor issues left that I can fix up before pushing. Therewere a number of cases with register restoring like this:


        ldr             x30, [sp]
        ldp             x4, x6, [sp, #16]
        ldp             x0, x1, [sp, #32]
        add             sp, sp, #48

Here we should fold the sp update ino the load as well, like this:

        ldp             x4, x6, [sp, #16]
        ldp             x0, x1, [sp, #32]
        ldr             x30, [sp], #48

In a few cases, this wasn't possible, due to the location of the registerthat is being restored, like this:


        ldr             x30, [sp, #56]
        add             sp, sp, #64

For the most idiomatic aarch64 assembly, I think it would be good torestructure it to keep x30 at the bottom of this area, to allow using thesame pattern for restores here too.

I pushed it with these changes. A later patch to restructure the registersaves to avoid the separate "add sp" would be appreciated though. I quitecertainly doesn't matter from a real-world performance perspective, but itwould make the code more idiomatic.


// Martin




On Sat, 23 Sep 2023, Logan.Lyu wrote:

Hi, Martin,

Thanks for your review.
Thanks for the patches. Functionally, they seem to work, and the issues isaw in the code are relatively minor. Unfortunately, some of the issues areissues that we've been through in many earlier patches, so I would hopethat you would pay attention to them in the future before posting morepatches.
Okay, I have noticed the previous issues and made some modificationsaccording to the issues, And I have completed the modifications based on yourcomments.
If there are any missing issues that have not been corrected, please let meknow.
在 2023/9/17 5:46, Martin Storsjö 写道:
On Thu, 14 Sep 2023, Logan.Lyu wrote:
Hi Martin,
You can try the attached patchset. If that doesn't work, My code branchaddress is https://github.com/myais2023/FFmpeg/tree/hevc-aarch64
Thanks for the patches. Functionally, they seem to work, and the issues isaw in the code are relatively minor. Unfortunately, some of the issues areissues that we've been through in many earlier patches, so I would hopethat you would pay attention to them in the future before posting morepatches.
In patch 1, you've got a bunch of sxtw instructions for src/dst strideparameters that have the type ptrdiff_t - that shouldn't be necessary?
In patch 2, you're moving the macros calc_epelh, calc_epelh2,load_epel_filterh - can you split out the move into a separate commit?(This isn't strictly necessary but would make things even clearer.)
In patch 2, you're storing below the stack, then decrementing it afterwards- e.g. like this:
+        stp             x0, x30, [sp, #-16]
+        stp             x1, x2, [sp, #-32]
+        stp             x3, x4, [sp, #-48]
+        stp             x5, x6, [sp, #-64]!
Please change that so that you're first predecrementing the whole area,then storing the other elements above that stack pointer, e.g. like this:
stp x0, x30, [sp, #-64]!
stp x1, x2, [sp, #16]
stp x3, x4, [sp, #32]

etc.

The same issue also appears in variouos places within functions like this:
+        stp             x0, x1, [sp, #-16]
+        stp             x4, x6, [sp, #-32]
+        stp             xzr, x30, [sp, #-48]!
Please fix all of these cases - you can search through your patches foranything related to storing on the stack. Also, storing xzr here seemssuperfluous - if you've got an odd number of registers to store, just makeone instruction str instead of stp (but keep the stack aligned).
Then in patch 4, you've got yet another pattern for doing these stores,where you have superfluous consecutive stack decrements like this:
+        stp             x6, x30, [sp, #-16]!
+        mov             x7, #16
+        stp             x0, x1, [sp, #-16]!
+        stp             x2, x3, [sp, #-16]!
+        stp             x4, x5, [sp, #-16]!
Please just do one stack decrement covering all the stack space you need.

I believe these issues have been raised in earlier reviews as well.

// Martin

_______________________________________________
ffmpeg-devel mailing list
[email protected]
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
[email protected] with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 1/4] lavc/aarch64: new optimization for 8-bit hevc_epel_uni_v

Reply via email to