[FFmpeg-devel] [PATCH 1/1] arm64: fix inverted register order in transpose_4x4H
Fix related register order issue in ff_h264_idct_add_neon. Found-by: zjh8890 <243186...@qq.com> --- libavcodec/aarch64/h264idct_neon.S | 4 ++-- libavcodec/aarch64/neon.S | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/libavcodec/aarch64/h264idct_neon.S b/libavcodec/aarch64/h264idct_neon.S index 99c2cb5..78f780a 100644 --- a/libavcodec/aarch64/h264idct_neon.S +++ b/libavcodec/aarch64/h264idct_neon.S @@ -37,8 +37,8 @@ function ff_h264_idct_add_neon, export=1 sub v7.4H, v16.4H, v3.4H add v0.4H, v4.4H, v6.4H add v1.4H, v5.4H, v7.4H -sub v2.4H, v4.4H, v6.4H -sub v3.4H, v5.4H, v7.4H +sub v3.4H, v4.4H, v6.4H +sub v2.4H, v5.4H, v7.4H transpose_4x4H v0, v1, v2, v3, v4, v5, v6, v7 diff --git a/libavcodec/aarch64/neon.S b/libavcodec/aarch64/neon.S index 512ef19..cf3e24d 100644 --- a/libavcodec/aarch64/neon.S +++ b/libavcodec/aarch64/neon.S @@ -112,8 +112,8 @@ .macro transpose_4x4H r0, r1, r2, r3, r4, r5, r6, r7 trn1\r4\().4H, \r0\().4H, \r1\().4H trn2\r5\().4H, \r0\().4H, \r1\().4H -trn1\r7\().4H, \r3\().4H, \r2\().4H -trn2\r6\().4H, \r3\().4H, \r2\().4H +trn1\r7\().4H, \r2\().4H, \r3\().4H +trn2\r6\().4H, \r2\().4H, \r3\().4H trn1\r0\().2S, \r4\().2S, \r7\().2S trn2\r3\().2S, \r4\().2S, \r7\().2S trn1\r1\().2S, \r5\().2S, \r6\().2S -- 2.6.4 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [libav-commits] checkasm: add fmtconvert tests
On 2015-12-21 19:06:15 +0100, Janne Grunau wrote: > Module: libav > Branch: master > Commit: 489e6add4478b0f5717dbf644234c6f3a3baf02c > > Author: Janne Grunau > Committer: Janne Grunau > Date: Tue Dec 8 16:24:57 2015 +0100 > > checkasm: add fmtconvert tests This test fails unfortunately under valgrind but only under valgrind. Besides all the fate configs which are probably recent Intel CPUs I tested it also on a Core 2 Duo and an AMD K8 (the two oldest CPU I have available). My checkasm emms check is triggered under valgrind by 'cvtpi2ps xmm1 mem64'. Valgrind does a x87 FPU to MMX state transition which all tested CPUs don't do. According to the current "Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 2" from September 2015 valgrinds behaviour is correct. But I think it is actually a bug in the documentation. I found HTML copy from 1999 of Intel's manual(1) which says that cvtpi2ps with a memory location as source doesn't cause a transition to MMX state. The current documentation for cvtpi2pd (packed int to packed double conversion) says the same. Valgrind wasn't following that that until Vitor reported it as #210264(2) in 2009 and it was fixed in (3). As Julian Seward says in the commit message the situation is a little bit fishy. I don't see a reason why the instruction should clobber mmx/fpu registers so I think it's probably a bug in the documentation (and valgrind). Since it will take a while to get fixed valgrind binaries on all fate hosts, I'd like to hide this failure temporarily by annotating the SSE versions of int32_to_float_fmul_scalar and int32_to_float_fmul_array8 as requiring an emms. Janne (1): http://www.c-jump.com/CIS77/reference/Intel/CIS77_24319102/pg_0162.htm (2): https://bugs.kde.org/show_bug.cgi?id=210264 (3): http://sourceforge.net/p/valgrind/mailman/message/24606437/ ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [libav-devel] [PATCH 1/2] avcodec: Add Cineform HD Decoder
On 2016-01-10 00:28:49 +, Kieran Kunhya wrote: > --- > libavcodec/Makefile | 1 + > libavcodec/allcodecs.c | 1 + > libavcodec/avcodec.h| 1 + > libavcodec/cfhd.c | 565 > > libavcodec/cfhd.h | 99 + > libavcodec/cfhddata.c | 470 > libavcodec/codec_desc.c | 6 + > 7 files changed, 1143 insertions(+) > create mode 100644 libavcodec/cfhd.c > create mode 100644 libavcodec/cfhd.h > create mode 100644 libavcodec/cfhddata.c > > diff --git a/libavcodec/Makefile b/libavcodec/Makefile > index b9ffdb9..c331a4f 100644 > --- a/libavcodec/Makefile > +++ b/libavcodec/Makefile > @@ -210,6 +210,7 @@ OBJS-$(CONFIG_CAVS_DECODER)+= cavs.o > cavsdec.o cavsdsp.o \ > OBJS-$(CONFIG_CCAPTION_DECODER)+= ccaption_dec.o > OBJS-$(CONFIG_CDGRAPHICS_DECODER) += cdgraphics.o > OBJS-$(CONFIG_CDXL_DECODER)+= cdxl.o > +OBJS-$(CONFIG_CFHD_DECODER)+= cfhd.o cfhddata.o > OBJS-$(CONFIG_CINEPAK_DECODER) += cinepak.o > OBJS-$(CONFIG_CINEPAK_ENCODER) += cinepakenc.o elbg.o > OBJS-$(CONFIG_CLJR_DECODER)+= cljrdec.o > diff --git a/libavcodec/allcodecs.c b/libavcodec/allcodecs.c > index 2128546..1d92c8b 100644 > --- a/libavcodec/allcodecs.c > +++ b/libavcodec/allcodecs.c > @@ -147,6 +147,7 @@ void avcodec_register_all(void) > REGISTER_DECODER(CAVS, cavs); > REGISTER_DECODER(CDGRAPHICS,cdgraphics); > REGISTER_DECODER(CDXL, cdxl); > +REGISTER_DECODER(CFHD, cfhd); > REGISTER_ENCDEC (CINEPAK, cinepak); > REGISTER_ENCDEC (CLJR, cljr); > REGISTER_DECODER(CLLC, cllc); > diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h > index f365775..b958a6c 100644 > --- a/libavcodec/avcodec.h > +++ b/libavcodec/avcodec.h > @@ -315,6 +315,7 @@ enum AVCodecID { > AV_CODEC_ID_SMVJPEG, > AV_CODEC_ID_APNG, > AV_CODEC_ID_DAALA, > +AV_CODEC_ID_CFHD, > > /* various PCM "codecs" */ > AV_CODEC_ID_FIRST_AUDIO = 0x1, ///< A dummy id pointing at the > start of audio codecs > diff --git a/libavcodec/cfhd.c b/libavcodec/cfhd.c > new file mode 100644 > index 000..dc36889 > --- /dev/null > +++ b/libavcodec/cfhd.c > @@ -0,0 +1,565 @@ > +/* > + * Copyright (c) 2015 Kieran Kunhya > + * > + * This file is part of FFmpeg. > + * > + * FFmpeg is free software; you can redistribute it and/or > + * modify it under the terms of the GNU Lesser General Public > + * License as published by the Free Software Foundation; either > + * version 2.1 of the License, or (at your option) any later version. > + * > + * FFmpeg is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * Lesser General Public License for more details. > + * > + * You should have received a copy of the GNU Lesser General Public > + * License along with FFmpeg; if not, write to the Free Software > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 > USA > + */ > + > +/** > + * @file > + * CFHD Video Decoder > + */ > + > +#include "libavutil/buffer.h" > +#include "libavutil/common.h" > +#include "libavutil/intreadwrite.h" > +#include "libavutil/imgutils.h" > +#include "libavutil/opt.h" > + > +#include "avcodec.h" > +#include "bswapdsp.h" > +#include "internal.h" > +#include "bytestream.h" > +#include "cfhd.h" > + > +static av_cold int cfhd_decode_init(AVCodecContext *avctx) > +{ > +CFHDContext *s = avctx->priv_data; > + > +avctx->pix_fmt = AV_PIX_FMT_YUV422P10; > +avctx->bits_per_raw_sample = 10; > +s->avctx = avctx; > + > +return ff_cfhd_init_vlcs(s); > +} > + > +static void init_plane_defaults(CFHDContext *s) > +{ > +s->subband_num= 0; > +s->level = 0; > +s->subband_num_actual = 0; > +} > + > +static void init_frame_defaults(CFHDContext *s) > +{ > +s->bpc = 10; > +s->channel_cnt = 4; > +s->subband_cnt = 10; > +s->channel_num = 0; > +s->lowpass_precision = 16; > +s->quantisation = 1; > +s->wavelet_depth = 3; > +s->pshift= 1; > +s->codebook = 0; > +init_plane_defaults(s); > +} > + > +static inline int dequant_and_decompand(int level, int quantisation) > +{ > +int abslevel = abs(level); > +return (abslevel + ((768 * abslevel * abslevel * abslevel) / (255 * 255 > * 255))) * FFSIGN(level) * quantisation; > +} > + > +static inline void filter(int16_t *output, ptrdiff_t out_stride, int16_t > *low, ptrdiff_t low_stride, > + int16_t *high, ptrdiff_t high_stride, int len) > +{ > +int32_t tmp; > + > +int i; > +for (i = 0; i < len; i++) { > +if( i == 0 ) > +
Re: [FFmpeg-devel] [PATCH] configure: arm: Don't add -march= to the compiler if no preference was passed
Hej, On 2021-09-20 13:00:40 +0300, Martin Storsjö wrote: > If no --cpu= option was passed to configure, we detect what the > compiler defaults to. This detected value was then fed back to the > rest of the configure logic, as if it was an explicit choice. > > This breaks on Ubuntu 21.10 with GCC 11.1. > > Since GCC 8, it's possible to add configure extra features via the > -march option, like e.g. -march=armv7-a+neon. If the -mfpu= option > is configured to default to 'auto', the fpu setting gets taken > from the -march option. > > GCC 11.1 in Ubuntu seems to be configured to use -mfpu=auto. This > has the effect of breaking any compilation command that specifies > -march=armv7-a, because the driver implicitly also adds -mfloat-abi=hard, > and that combination results in this error: > > cc1: error: ‘-mfloat-abi=hard’: selected processor lacks an FPU > > Therefore, restructure configure. If no specific preference was set > (and the 'cpu' configure variable was set as the output of > probe_arm_arch), the value we tried to set via -march= was the same > value that we just tried to detect as the compiler default. > > So instead, just try to detect what the compiler defaults to, with > to allow setting other configure settings (such as 'fast_unaligned'), > but don't try to spell out the compiler's default via the -march flag. > > Signed-off-by: Martin Storsjö > --- > configure | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/configure b/configure > index 9249254b70..3a05b8294b 100755 > --- a/configure > +++ b/configure > @@ -5009,9 +5009,11 @@ elif enabled arm; then > fi > } > > -[ "$cpu" = generic ] && cpu=$(probe_arm_arch) > > case $cpu in > +generic) > +subarch=$(probe_arm_arch | sed 's/[^a-z0-9]//g') > +;; > armv*) > cpuflags="-march=$cpu" > subarch=$(echo $cpu | sed 's/[^a-z0-9]//g') looks good to me Janne ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 2/3] arm: vp9mc: Load only 12 pixels in the 4 pixel wide horizontal filter
This reduces the amount the horizontal filters read beyond the filter width to a consistent 1 pixel. The data is not used so this is usually not noticeable. It becomes a problem when the application allocates frame buffers only for the aligned picture size and the end of it is at a page boundary. This happens for picture sizes which are a multiple of the page size like 1280x640. The frame buffer allocation is based on its most likely done via mmap + MAP_ANONYMOUS so start and end of the buffer are page aligned and the previous and next page are not necessarily mapped. This mirrors the aarch64 change. Signed-off-by: Janne Grunau --- libavcodec/arm/vp9mc_neon.S | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/libavcodec/arm/vp9mc_neon.S b/libavcodec/arm/vp9mc_neon.S index bd8cda7c308f..2ec729bb314d 100644 --- a/libavcodec/arm/vp9mc_neon.S +++ b/libavcodec/arm/vp9mc_neon.S @@ -279,11 +279,13 @@ function \type\()_8tap_\size\()h_\idx1\idx2 sub r1, r1, r5 .endif @ size >= 16 loads two qwords and increments r2, -@ for size 4/8 it's enough with one qword and no -@ postincrement +@ size 4 loads 1 d word, increments r2 and loads 1 32-bit lane +@ for size 8 it's enough with one qword and no postincrement .if \size >= 16 sub r3, r3, r5 sub r3, r3, #8 +.elseif \size == 4 +sub r3, r3, #8 .endif @ Load the filter vector vld1.16 {q0}, [r12,:128] @@ -295,9 +297,14 @@ function \type\()_8tap_\size\()h_\idx1\idx2 .if \size >= 16 vld1.8 {d18, d19, d20}, [r2]! vld1.8 {d24, d25, d26}, [r7]! -.else +.elseif \size == 8 vld1.8 {q9}, [r2] vld1.8 {q12}, [r7] +.else @ size == 4 +vld1.8 {d18}, [r2]! +vld1.8 {d24}, [r7]! +vld1.32 {d19[0]}, [r2] +vld1.32 {d25[0]}, [r7] .endif vmovl.u8q8, d18 vmovl.u8q9, d19 -- 2.45.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 3/3] vp9: recon: Use emulated edge to prevent buffer overflows
The arm/aarch64 horizontal filter reads one additional pixel beyond what the filter uses. This can become an issue if the application does not allocate larger buffers than what's required for the pixel data. If the motion vector points to the bottom right edge of the picture this becomes a read buffer overflow. This triggers segfaults in Firefox for video resolutions which result in a page aligned picture size like 1280x640. Prevent this by using emulated edge in this case. Fixes: https://bugzilla.mozilla.org/show_bug.cgi?id=1881185 Signed-off-by: Janne Grunau --- libavcodec/vp9recon.c | 8 1 file changed, 8 insertions(+) diff --git a/libavcodec/vp9recon.c b/libavcodec/vp9recon.c index ef08ed17c8e5..1f1b99a03c8a 100644 --- a/libavcodec/vp9recon.c +++ b/libavcodec/vp9recon.c @@ -319,7 +319,11 @@ static av_always_inline void mc_luma_unscaled(VP9TileData *td, const vp9_mc_func // The arm/aarch64 _hv filters read one more row than what actually is // needed, so switch to emulated edge one pixel sooner vertically // (!!my * 5) than horizontally (!!mx * 4). +// The arm/aarch64 _h filters read one more pixel than what actually is +// needed, so switch to emulated edge if that would read beyond the bottom +// right block. if (x < !!mx * 3 || y < !!my * 3 || +((x + !!mx * 5 > w - bw) && (y + !!my * 5 + 1 > h - bh)) || x + !!mx * 4 > w - bw || y + !!my * 5 > h - bh) { s->vdsp.emulated_edge_mc(td->edge_emu_buffer, ref - !!my * 3 * ref_stride - !!mx * 3 * bytesperpixel, @@ -358,7 +362,11 @@ static av_always_inline void mc_chroma_unscaled(VP9TileData *td, const vp9_mc_fu // The arm/aarch64 _hv filters read one more row than what actually is // needed, so switch to emulated edge one pixel sooner vertically // (!!my * 5) than horizontally (!!mx * 4). +// The arm/aarch64 _h filters read one more pixel than what actually is +// needed, so switch to emulated edge if that would read beyond the bottom +// right block. if (x < !!mx * 3 || y < !!my * 3 || +((x + !!mx * 5 > w - bw) && (y + !!my * 5 + 1 > h - bh)) || x + !!mx * 4 > w - bw || y + !!my * 5 > h - bh) { s->vdsp.emulated_edge_mc(td->edge_emu_buffer, ref_u - !!my * 3 * src_stride_u - !!mx * 3 * bytesperpixel, -- 2.45.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 1/3] aarch64: vp9mc: Load only 12 pixels in the 4 pixel wide horizontal filter
This reduces the amount the horizontal filters read beyond the filter width to a consistent 1 pixel. The data is not used so this is usually not noticeable. It becomes a problem when the application allocates frame buffers only for the aligned picture size and the end of it is at a page boundary. This happens for picture sizes which are a multiple of the page size like 1280x640. The frame buffer allocation is based on its most likely done via mmap + MAP_ANONYMOUS so start and end of the buffer are page aligned and the previous and next page are not necessarily mapped. Under these conditions like seen by Firefox a read beyond the end of the buffer results in a segfault. After the over-read is reduced to a single pixel it's reasonable to use VP9's emulated edge motion compensation for this. Fixes: https://bugzilla.mozilla.org/show_bug.cgi?id=1881185 Signed-off-by: Janne Grunau --- libavcodec/aarch64/vp9mc_neon.S | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/libavcodec/aarch64/vp9mc_neon.S b/libavcodec/aarch64/vp9mc_neon.S index abf2bae9db07..38f44ca56d0d 100644 --- a/libavcodec/aarch64/vp9mc_neon.S +++ b/libavcodec/aarch64/vp9mc_neon.S @@ -230,6 +230,9 @@ function \type\()_8tap_\size\()h_\idx1\idx2 // reduced dst stride .if \size >= 16 sub x1, x1, x5 +.elseif \size == 4 +add x12, x2, #8 +add x13, x7, #8 .endif // size >= 16 loads two qwords and increments x2, // for size 4/8 it's enough with one qword and no @@ -248,9 +251,14 @@ function \type\()_8tap_\size\()h_\idx1\idx2 .if \size >= 16 ld1 {v4.8b, v5.8b, v6.8b}, [x2], #24 ld1 {v16.8b, v17.8b, v18.8b}, [x7], #24 -.else +.elseif \size == 8 ld1 {v4.8b, v5.8b}, [x2] ld1 {v16.8b, v17.8b}, [x7] +.else // \size == 4 +ld1 {v4.8b}, [x2] +ld1 {v16.8b}, [x7] +ld1 {v5.s}[0], [x12], x3 +ld1 {v17.s}[0], [x13], x3 .endif uxtlv4.8h, v4.8b uxtlv5.8h, v5.8b -- 2.45.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 2/3] arm: vp9mc: Load only 12 pixels in the 4 pixel wide horizontal filter
On Fri, Dec 20, 2024 at 07:53:33PM +0100, Michael Niedermayer wrote: > Hi Janne > > On Thu, Dec 19, 2024 at 10:12:22PM +0100, Janne Grunau wrote: > > This reduces the amount the horizontal filters read beyond the filter > > width to a consistent 1 pixel. The data is not used so this is usually > > not noticeable. It becomes a problem when the application allocates > > frame buffers only for the aligned picture size and the end of it is at > > a page boundary. This happens for picture sizes which are a multiple of > > the page size like 1280x640. The frame buffer allocation is based on > > its most likely done via mmap + MAP_ANONYMOUS so start and end of the > > buffer are page aligned and the previous and next page are not > > necessarily mapped. > > This mirrors the aarch64 change. > > > > Signed-off-by: Janne Grunau > > --- > > libavcodec/arm/vp9mc_neon.S | 13 ++--- > > 1 file changed, 10 insertions(+), 3 deletions(-) > > tested on qemu-arm (assuming the standard fate and qemu covers this) the asm is covered by checkasm and intentionally breaking the functions was detected Janne ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH v2 3/3] vp9: recon: Use emulated edge to prevent buffer overflows
The arm/aarch64 horizontal filter reads one additional pixel beyond what the filter uses. This can become an issue if the application does not allocate larger buffers than what's required for the pixel data. If the motion vector points to the bottom right edge of the picture this becomes a read buffer overflow. This triggers segfaults in Firefox for video resolutions which result in a page aligned picture size like 1280x640. Prevent this by using emulated edge in this case. Fixes: https://bugzilla.mozilla.org/show_bug.cgi?id=1881185 Signed-off-by: Janne Grunau --- libavcodec/vp9recon.c | 8 1 file changed, 8 insertions(+) diff --git a/libavcodec/vp9recon.c b/libavcodec/vp9recon.c index ef08ed17c8e5..ccc49d771676 100644 --- a/libavcodec/vp9recon.c +++ b/libavcodec/vp9recon.c @@ -319,7 +319,11 @@ static av_always_inline void mc_luma_unscaled(VP9TileData *td, const vp9_mc_func // The arm/aarch64 _hv filters read one more row than what actually is // needed, so switch to emulated edge one pixel sooner vertically // (!!my * 5) than horizontally (!!mx * 4). +// The arm/aarch64 _h filters read one more pixel than what actually is +// needed, so switch to emulated edge if that would read beyond the bottom +// right block. if (x < !!mx * 3 || y < !!my * 3 || +((ARCH_AARCH64 || ARCH_ARM) && (x + !!mx * 5 > w - bw) && (y + !!my * 5 + 1 > h - bh)) || x + !!mx * 4 > w - bw || y + !!my * 5 > h - bh) { s->vdsp.emulated_edge_mc(td->edge_emu_buffer, ref - !!my * 3 * ref_stride - !!mx * 3 * bytesperpixel, @@ -358,7 +362,11 @@ static av_always_inline void mc_chroma_unscaled(VP9TileData *td, const vp9_mc_fu // The arm/aarch64 _hv filters read one more row than what actually is // needed, so switch to emulated edge one pixel sooner vertically // (!!my * 5) than horizontally (!!mx * 4). +// The arm/aarch64 _h filters read one more pixel than what actually is +// needed, so switch to emulated edge if that would read beyond the bottom +// right block. if (x < !!mx * 3 || y < !!my * 3 || +((ARCH_AARCH64 || ARCH_ARM) && (x + !!mx * 5 > w - bw) && (y + !!my * 5 + 1 > h - bh)) || x + !!mx * 4 > w - bw || y + !!my * 5 > h - bh) { s->vdsp.emulated_edge_mc(td->edge_emu_buffer, ref_u - !!my * 3 * src_stride_u - !!mx * 3 * bytesperpixel, -- 2.45.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH v2 1/3] aarch64: vp9mc: Load only 12 pixels in the 4 pixel wide horizontal filter
This reduces the amount the horizontal filters read beyond the filter width to a consistent 1 pixel. The data is not used so this is usually not noticeable. It becomes a problem when the application allocates frame buffers only for the aligned picture size and the end of it is at a page boundary. This happens for picture sizes which are a multiple of the page size like 1280x640. The frame buffer allocation is based on its most likely done via mmap + MAP_ANONYMOUS so start and end of the buffer are page aligned and the previous and next page are not necessarily mapped. Under these conditions like seen by Firefox a read beyond the end of the buffer results in a segfault. After the over-read is reduced to a single pixel it's reasonable to use VP9's emulated edge motion compensation for this. Fixes: https://bugzilla.mozilla.org/show_bug.cgi?id=1881185 Signed-off-by: Janne Grunau --- libavcodec/aarch64/vp9mc_neon.S | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/libavcodec/aarch64/vp9mc_neon.S b/libavcodec/aarch64/vp9mc_neon.S index abf2bae9db07..38f44ca56d0d 100644 --- a/libavcodec/aarch64/vp9mc_neon.S +++ b/libavcodec/aarch64/vp9mc_neon.S @@ -230,6 +230,9 @@ function \type\()_8tap_\size\()h_\idx1\idx2 // reduced dst stride .if \size >= 16 sub x1, x1, x5 +.elseif \size == 4 +add x12, x2, #8 +add x13, x7, #8 .endif // size >= 16 loads two qwords and increments x2, // for size 4/8 it's enough with one qword and no @@ -248,9 +251,14 @@ function \type\()_8tap_\size\()h_\idx1\idx2 .if \size >= 16 ld1 {v4.8b, v5.8b, v6.8b}, [x2], #24 ld1 {v16.8b, v17.8b, v18.8b}, [x7], #24 -.else +.elseif \size == 8 ld1 {v4.8b, v5.8b}, [x2] ld1 {v16.8b, v17.8b}, [x7] +.else // \size == 4 +ld1 {v4.8b}, [x2] +ld1 {v16.8b}, [x7] +ld1 {v5.s}[0], [x12], x3 +ld1 {v17.s}[0], [x13], x3 .endif uxtlv4.8h, v4.8b uxtlv5.8h, v5.8b -- 2.45.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH v2 2/3] arm: vp9mc: Load only 12 pixels in the 4 pixel wide horizontal filter
This reduces the amount the horizontal filters read beyond the filter width to a consistent 1 pixel. The data is not used so this is usually not noticeable. It becomes a problem when the application allocates frame buffers only for the aligned picture size and the end of it is at a page boundary. This happens for picture sizes which are a multiple of the page size like 1280x640. The frame buffer allocation is based on its most likely done via mmap + MAP_ANONYMOUS so start and end of the buffer are page aligned and the previous and next page are not necessarily mapped. This mirrors the aarch64 change. Signed-off-by: Janne Grunau --- libavcodec/arm/vp9mc_neon.S | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/libavcodec/arm/vp9mc_neon.S b/libavcodec/arm/vp9mc_neon.S index bd8cda7c308f..2ec729bb314d 100644 --- a/libavcodec/arm/vp9mc_neon.S +++ b/libavcodec/arm/vp9mc_neon.S @@ -279,11 +279,13 @@ function \type\()_8tap_\size\()h_\idx1\idx2 sub r1, r1, r5 .endif @ size >= 16 loads two qwords and increments r2, -@ for size 4/8 it's enough with one qword and no -@ postincrement +@ size 4 loads 1 d word, increments r2 and loads 1 32-bit lane +@ for size 8 it's enough with one qword and no postincrement .if \size >= 16 sub r3, r3, r5 sub r3, r3, #8 +.elseif \size == 4 +sub r3, r3, #8 .endif @ Load the filter vector vld1.16 {q0}, [r12,:128] @@ -295,9 +297,14 @@ function \type\()_8tap_\size\()h_\idx1\idx2 .if \size >= 16 vld1.8 {d18, d19, d20}, [r2]! vld1.8 {d24, d25, d26}, [r7]! -.else +.elseif \size == 8 vld1.8 {q9}, [r2] vld1.8 {q12}, [r7] +.else @ size == 4 +vld1.8 {d18}, [r2]! +vld1.8 {d24}, [r7]! +vld1.32 {d19[0]}, [r2] +vld1.32 {d25[0]}, [r7] .endif vmovl.u8q8, d18 vmovl.u8q9, d19 -- 2.45.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 3/3] vp9: recon: Use emulated edge to prevent buffer overflows
On Fri, Dec 20, 2024 at 07:51:08PM +0100, Michael Niedermayer wrote: > Hi Janne > > On Thu, Dec 19, 2024 at 10:12:23PM +0100, Janne Grunau wrote: > > The arm/aarch64 horizontal filter reads one additional pixel beyond what > > the filter uses. This can become an issue if the application does not > > allocate larger buffers than what's required for the pixel data. If the > > motion vector points to the bottom right edge of the picture this > > becomes a read buffer overflow. This triggers segfaults in Firefox for > > video resolutions which result in a page aligned picture size like > > 1280x640. > > Prevent this by using emulated edge in this case. > > > > Fixes: https://bugzilla.mozilla.org/show_bug.cgi?id=1881185 > > Signed-off-by: Janne Grunau > > --- > > libavcodec/vp9recon.c | 8 > > 1 file changed, 8 insertions(+) > > patch LGTM > maybe could have ARCH_... && added to avoid evaluating the extra condition > when not needed I didn't do that since I assumed that other asm might do that as well. I checked the x86 asm which doesn't appear to the same issue. I'll send an updated patch. Janne resend due to wrong From: address, if a ML admin reads please kill the duplicate mail in the moderation queue (I've get no notification that the mail is held though). ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".