This reduces the amount the horizontal filters read beyond the filter width to a consistent 1 pixel. The data is not used so this is usually not noticeable. It becomes a problem when the application allocates frame buffers only for the aligned picture size and the end of it is at a page boundary. This happens for picture sizes which are a multiple of the page size like 1280x640. The frame buffer allocation is based on its most likely done via mmap + MAP_ANONYMOUS so start and end of the buffer are page aligned and the previous and next page are not necessarily mapped. This mirrors the aarch64 change.
Signed-off-by: Janne Grunau <janne-ffm...@jannau.net> --- libavcodec/arm/vp9mc_neon.S | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/libavcodec/arm/vp9mc_neon.S b/libavcodec/arm/vp9mc_neon.S index bd8cda7c308f..2ec729bb314d 100644 --- a/libavcodec/arm/vp9mc_neon.S +++ b/libavcodec/arm/vp9mc_neon.S @@ -279,11 +279,13 @@ function \type\()_8tap_\size\()h_\idx1\idx2 sub r1, r1, r5 .endif @ size >= 16 loads two qwords and increments r2, - @ for size 4/8 it's enough with one qword and no - @ postincrement + @ size 4 loads 1 d word, increments r2 and loads 1 32-bit lane + @ for size 8 it's enough with one qword and no postincrement .if \size >= 16 sub r3, r3, r5 sub r3, r3, #8 +.elseif \size == 4 + sub r3, r3, #8 .endif @ Load the filter vector vld1.16 {q0}, [r12,:128] @@ -295,9 +297,14 @@ function \type\()_8tap_\size\()h_\idx1\idx2 .if \size >= 16 vld1.8 {d18, d19, d20}, [r2]! vld1.8 {d24, d25, d26}, [r7]! -.else +.elseif \size == 8 vld1.8 {q9}, [r2] vld1.8 {q12}, [r7] +.else @ size == 4 + vld1.8 {d18}, [r2]! + vld1.8 {d24}, [r7]! + vld1.32 {d19[0]}, [r2] + vld1.32 {d25[0]}, [r7] .endif vmovl.u8 q8, d18 vmovl.u8 q9, d19 -- 2.45.2 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".