On 7/25/2024 1:50 PM, Rémi Denis-Courmont wrote:
Le torstaina 25. heinäkuuta 2024, 19.16.21 EEST James Almer a écrit :
On 7/25/2024 12:53 PM, Rémi Denis-Courmont wrote:
The current code assumes that we have unaligned rows, which hurts on
platforms with slower unaligned accesses. (Also, this lets the compiler
unroll manually, which it seems to do in practice.)
---

   libavcodec/pixblockdsp.c | 9 ++++++++-
   1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/libavcodec/pixblockdsp.c b/libavcodec/pixblockdsp.c
index bbbeca1618..1fff244511 100644
--- a/libavcodec/pixblockdsp.c
+++ b/libavcodec/pixblockdsp.c
@@ -26,6 +26,13 @@

   static void get_pixels_16_c(int16_t *restrict block, const uint8_t
   *pixels,
ptrdiff_t stride)

Is there a way to hint the compiler that block is 16 byte aligned? GCC
14 at least emits unaligned loads and stores for these.

We don't have uint128_t, so the best we could do is cast to uint64_t *. Though
GCC 13 emits 64-bit loads and stores on RV64 here with the given code. Is this
maybe a problem with the COPY128 macro definition on x86?

AV_COPY128 with GCC x86 uses aligned load intrinsics, but at least GCC 14 emits movdqu instructions here for some reason.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to