Hi,
Thanks, this looks mostly ok now.
There were a few minor issues left that I can fix up before pushing. There
were a number of cases with register restoring like this:
ldr x30, [sp]
ldp x4, x6, [sp, #16]
ldp x0, x1, [sp, #32]
add sp, sp, #48
Here we should fold the sp update ino the load as well, like this:
ldp x4, x6, [sp, #16]
ldp x0, x1, [sp, #32]
ldr x30, [sp], #48
In a few cases, this wasn't possible, due to the location of the register
that is being restored, like this:
ldr x30, [sp, #56]
add sp, sp, #64
For the most idiomatic aarch64 assembly, I think it would be good to
restructure it to keep x30 at the bottom of this area, to allow using the
same pattern for restores here too.
I pushed it with these changes. A later patch to restructure the register
saves to avoid the separate "add sp" would be appreciated though. I quite
certainly doesn't matter from a real-world performance perspective, but it
would make the code more idiomatic.
// Martin
On Sat, 23 Sep 2023, Logan.Lyu wrote:
Hi, Martin,
Thanks for your review.
Thanks for the patches. Functionally, they seem to work, and the issues i
saw in the code are relatively minor. Unfortunately, some of the issues are
issues that we've been through in many earlier patches, so I would hope
that you would pay attention to them in the future before posting more
patches.
Okay, I have noticed the previous issues and made some modifications
according to the issues, And I have completed the modifications based on your
comments.
If there are any missing issues that have not been corrected, please let me
know.
在 2023/9/17 5:46, Martin Storsjö 写道:
On Thu, 14 Sep 2023, Logan.Lyu wrote:
Hi Martin,
You can try the attached patchset. If that doesn't work, My code branch
address is https://github.com/myais2023/FFmpeg/tree/hevc-aarch64
Thanks for the patches. Functionally, they seem to work, and the issues i
saw in the code are relatively minor. Unfortunately, some of the issues are
issues that we've been through in many earlier patches, so I would hope
that you would pay attention to them in the future before posting more
patches.
In patch 1, you've got a bunch of sxtw instructions for src/dst stride
parameters that have the type ptrdiff_t - that shouldn't be necessary?
In patch 2, you're moving the macros calc_epelh, calc_epelh2,
load_epel_filterh - can you split out the move into a separate commit?
(This isn't strictly necessary but would make things even clearer.)
In patch 2, you're storing below the stack, then decrementing it afterwards
- e.g. like this:
+ stp x0, x30, [sp, #-16]
+ stp x1, x2, [sp, #-32]
+ stp x3, x4, [sp, #-48]
+ stp x5, x6, [sp, #-64]!
Please change that so that you're first predecrementing the whole area,
then storing the other elements above that stack pointer, e.g. like this:
stp x0, x30, [sp, #-64]!
stp x1, x2, [sp, #16]
stp x3, x4, [sp, #32]
etc.
The same issue also appears in variouos places within functions like this:
+ stp x0, x1, [sp, #-16]
+ stp x4, x6, [sp, #-32]
+ stp xzr, x30, [sp, #-48]!
Please fix all of these cases - you can search through your patches for
anything related to storing on the stack. Also, storing xzr here seems
superfluous - if you've got an odd number of registers to store, just make
one instruction str instead of stp (but keep the stack aligned).
Then in patch 4, you've got yet another pattern for doing these stores,
where you have superfluous consecutive stack decrements like this:
+ stp x6, x30, [sp, #-16]!
+ mov x7, #16
+ stp x0, x1, [sp, #-16]!
+ stp x2, x3, [sp, #-16]!
+ stp x4, x5, [sp, #-16]!
Please just do one stack decrement covering all the stack space you need.
I believe these issues have been raised in earlier reviews as well.
// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".