from:"Janne Grunau"

[FFmpeg-devel] [PATCH 1/1] arm64: fix inverted register order in transpose_4x4H

2015-12-18 Thread Janne Grunau

Fix related register order issue in ff_h264_idct_add_neon.

Found-by: zjh8890 <243186...@qq.com>
---
 libavcodec/aarch64/h264idct_neon.S | 4 ++--
 libavcodec/aarch64/neon.S  | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/libavcodec/aarch64/h264idct_neon.S 
b/libavcodec/aarch64/h264idct_neon.S
index 99c2cb5..78f780a 100644
--- a/libavcodec/aarch64/h264idct_neon.S
+++ b/libavcodec/aarch64/h264idct_neon.S
@@ -37,8 +37,8 @@ function ff_h264_idct_add_neon, export=1
 sub v7.4H,  v16.4H, v3.4H
 add v0.4H,  v4.4H,  v6.4H
 add v1.4H,  v5.4H,  v7.4H
-sub v2.4H,  v4.4H,  v6.4H
-sub v3.4H,  v5.4H,  v7.4H
+sub v3.4H,  v4.4H,  v6.4H
+sub v2.4H,  v5.4H,  v7.4H
 
 transpose_4x4H  v0, v1, v2, v3, v4, v5, v6, v7
 
diff --git a/libavcodec/aarch64/neon.S b/libavcodec/aarch64/neon.S
index 512ef19..cf3e24d 100644
--- a/libavcodec/aarch64/neon.S
+++ b/libavcodec/aarch64/neon.S
@@ -112,8 +112,8 @@
 .macro  transpose_4x4H  r0, r1, r2, r3, r4, r5, r6, r7
 trn1\r4\().4H,  \r0\().4H,  \r1\().4H
 trn2\r5\().4H,  \r0\().4H,  \r1\().4H
-trn1\r7\().4H,  \r3\().4H,  \r2\().4H
-trn2\r6\().4H,  \r3\().4H,  \r2\().4H
+trn1\r7\().4H,  \r2\().4H,  \r3\().4H
+trn2\r6\().4H,  \r2\().4H,  \r3\().4H
 trn1\r0\().2S,  \r4\().2S,  \r7\().2S
 trn2\r3\().2S,  \r4\().2S,  \r7\().2S
 trn1\r1\().2S,  \r5\().2S,  \r6\().2S
-- 
2.6.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [libav-commits] checkasm: add fmtconvert tests

2015-12-22 Thread Janne Grunau

On 2015-12-21 19:06:15 +0100, Janne Grunau  wrote:
> Module: libav
> Branch: master
> Commit: 489e6add4478b0f5717dbf644234c6f3a3baf02c
> 
> Author:    Janne Grunau 
> Committer: Janne Grunau 
> Date:  Tue Dec  8 16:24:57 2015 +0100
> 
> checkasm: add fmtconvert tests

This test fails unfortunately under valgrind but only under valgrind.  
Besides all the fate configs which are probably recent Intel CPUs I 
tested it also on a Core 2 Duo and an AMD K8 (the two oldest CPU I have 
available). My checkasm emms check is triggered under valgrind by 
'cvtpi2ps xmm1 mem64'. Valgrind does a x87 FPU to MMX state transition 
which all tested CPUs don't do. According to the current "Intel® 64 and 
IA-32 Architectures Software Developer's Manual Volume 2" from September 
2015 valgrinds behaviour is correct. But I think it is actually a bug in 
the documentation.

I found HTML copy from 1999 of Intel's manual(1) which says that 
cvtpi2ps with a memory location as source doesn't cause a transition to 
MMX state. The current documentation for cvtpi2pd (packed int to packed 
double conversion) says the same. Valgrind wasn't following that that 
until Vitor reported it as #210264(2) in 2009 and it was fixed in (3).  
As Julian Seward says in the commit message the situation is a little 
bit fishy.

I don't see a reason why the instruction should clobber mmx/fpu 
registers so I think it's probably a bug in the documentation (and 
valgrind). Since it will take a while to get fixed valgrind binaries on 
all fate hosts, I'd like to hide this failure temporarily by annotating 
the SSE versions of int32_to_float_fmul_scalar and 
int32_to_float_fmul_array8 as requiring an emms.

Janne

(1):
http://www.c-jump.com/CIS77/reference/Intel/CIS77_24319102/pg_0162.htm
(2): https://bugs.kde.org/show_bug.cgi?id=210264
(3): http://sourceforge.net/p/valgrind/mailman/message/24606437/

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [libav-devel] [PATCH 1/2] avcodec: Add Cineform HD Decoder

2016-01-10 Thread Janne Grunau

On 2016-01-10 00:28:49 +, Kieran Kunhya wrote:
> ---
>  libavcodec/Makefile |   1 +
>  libavcodec/allcodecs.c  |   1 +
>  libavcodec/avcodec.h|   1 +
>  libavcodec/cfhd.c   | 565 
> 
>  libavcodec/cfhd.h   |  99 +
>  libavcodec/cfhddata.c   | 470 
>  libavcodec/codec_desc.c |   6 +
>  7 files changed, 1143 insertions(+)
>  create mode 100644 libavcodec/cfhd.c
>  create mode 100644 libavcodec/cfhd.h
>  create mode 100644 libavcodec/cfhddata.c
> 
> diff --git a/libavcodec/Makefile b/libavcodec/Makefile
> index b9ffdb9..c331a4f 100644
> --- a/libavcodec/Makefile
> +++ b/libavcodec/Makefile
> @@ -210,6 +210,7 @@ OBJS-$(CONFIG_CAVS_DECODER)+= cavs.o 
> cavsdec.o cavsdsp.o \
>  OBJS-$(CONFIG_CCAPTION_DECODER)+= ccaption_dec.o
>  OBJS-$(CONFIG_CDGRAPHICS_DECODER)  += cdgraphics.o
>  OBJS-$(CONFIG_CDXL_DECODER)+= cdxl.o
> +OBJS-$(CONFIG_CFHD_DECODER)+= cfhd.o cfhddata.o
>  OBJS-$(CONFIG_CINEPAK_DECODER) += cinepak.o
>  OBJS-$(CONFIG_CINEPAK_ENCODER) += cinepakenc.o elbg.o
>  OBJS-$(CONFIG_CLJR_DECODER)+= cljrdec.o
> diff --git a/libavcodec/allcodecs.c b/libavcodec/allcodecs.c
> index 2128546..1d92c8b 100644
> --- a/libavcodec/allcodecs.c
> +++ b/libavcodec/allcodecs.c
> @@ -147,6 +147,7 @@ void avcodec_register_all(void)
>  REGISTER_DECODER(CAVS,  cavs);
>  REGISTER_DECODER(CDGRAPHICS,cdgraphics);
>  REGISTER_DECODER(CDXL,  cdxl);
> +REGISTER_DECODER(CFHD,  cfhd);
>  REGISTER_ENCDEC (CINEPAK,   cinepak);
>  REGISTER_ENCDEC (CLJR,  cljr);
>  REGISTER_DECODER(CLLC,  cllc);
> diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h
> index f365775..b958a6c 100644
> --- a/libavcodec/avcodec.h
> +++ b/libavcodec/avcodec.h
> @@ -315,6 +315,7 @@ enum AVCodecID {
>  AV_CODEC_ID_SMVJPEG,
>  AV_CODEC_ID_APNG,
>  AV_CODEC_ID_DAALA,
> +AV_CODEC_ID_CFHD,
>  
>  /* various PCM "codecs" */
>  AV_CODEC_ID_FIRST_AUDIO = 0x1, ///< A dummy id pointing at the 
> start of audio codecs
> diff --git a/libavcodec/cfhd.c b/libavcodec/cfhd.c
> new file mode 100644
> index 000..dc36889
> --- /dev/null
> +++ b/libavcodec/cfhd.c
> @@ -0,0 +1,565 @@
> +/*
> + * Copyright (c) 2015 Kieran Kunhya
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 
> USA
> + */
> +
> +/**
> + * @file
> + * CFHD Video Decoder
> + */
> +
> +#include "libavutil/buffer.h"
> +#include "libavutil/common.h"
> +#include "libavutil/intreadwrite.h"
> +#include "libavutil/imgutils.h"
> +#include "libavutil/opt.h"
> +
> +#include "avcodec.h"
> +#include "bswapdsp.h"
> +#include "internal.h"
> +#include "bytestream.h"
> +#include "cfhd.h"
> +
> +static av_cold int cfhd_decode_init(AVCodecContext *avctx)
> +{
> +CFHDContext *s = avctx->priv_data;
> +
> +avctx->pix_fmt = AV_PIX_FMT_YUV422P10;
> +avctx->bits_per_raw_sample = 10;
> +s->avctx   = avctx;
> +
> +return ff_cfhd_init_vlcs(s);
> +}
> +
> +static void init_plane_defaults(CFHDContext *s)
> +{
> +s->subband_num= 0;
> +s->level  = 0;
> +s->subband_num_actual = 0;
> +}
> +
> +static void init_frame_defaults(CFHDContext *s)
> +{
> +s->bpc   = 10;
> +s->channel_cnt   = 4;
> +s->subband_cnt   = 10;
> +s->channel_num   = 0;
> +s->lowpass_precision = 16;
> +s->quantisation  = 1;
> +s->wavelet_depth = 3;
> +s->pshift= 1;
> +s->codebook  = 0;
> +init_plane_defaults(s);
> +}
> +
> +static inline int dequant_and_decompand(int level, int quantisation)
> +{
> +int abslevel = abs(level);
> +return (abslevel + ((768 * abslevel * abslevel * abslevel) / (255 * 255 
> * 255))) * FFSIGN(level) * quantisation;
> +}
> +
> +static inline void filter(int16_t *output, ptrdiff_t out_stride, int16_t 
> *low, ptrdiff_t low_stride,
> +  int16_t *high, ptrdiff_t high_stride, int len)
> +{
> +int32_t tmp;
> +
> +int i;
> +for (i = 0; i < len; i++) {
> +if( i == 0 )
> +

Re: [FFmpeg-devel] [PATCH] configure: arm: Don't add -march= to the compiler if no preference was passed

2021-09-21 Thread Janne Grunau

Hej,

On 2021-09-20 13:00:40 +0300, Martin Storsjö wrote:
> If no --cpu= option was passed to configure, we detect what the
> compiler defaults to. This detected value was then fed back to the
> rest of the configure logic, as if it was an explicit choice.
> 
> This breaks on Ubuntu 21.10 with GCC 11.1.
> 
> Since GCC 8, it's possible to add configure extra features via the
> -march option, like e.g. -march=armv7-a+neon. If the -mfpu= option
> is configured to default to 'auto', the fpu setting gets taken
> from the -march option.
> 
> GCC 11.1 in Ubuntu seems to be configured to use -mfpu=auto. This
> has the effect of breaking any compilation command that specifies
> -march=armv7-a, because the driver implicitly also adds -mfloat-abi=hard,
> and that combination results in this error:
> 
> cc1: error: ‘-mfloat-abi=hard’: selected processor lacks an FPU
> 
> Therefore, restructure configure. If no specific preference was set
> (and the 'cpu' configure variable was set as the output of
> probe_arm_arch), the value we tried to set via -march= was the same
> value that we just tried to detect as the compiler default.
> 
> So instead, just try to detect what the compiler defaults to, with
> to allow setting other configure settings (such as 'fast_unaligned'),
> but don't try to spell out the compiler's default via the -march flag.
> 
> Signed-off-by: Martin Storsjö 
> ---
>  configure | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/configure b/configure
> index 9249254b70..3a05b8294b 100755
> --- a/configure
> +++ b/configure
> @@ -5009,9 +5009,11 @@ elif enabled arm; then
>  fi
>  }
>  
> -[ "$cpu" = generic ] && cpu=$(probe_arm_arch)
>  
>  case $cpu in
> +generic)
> +subarch=$(probe_arm_arch | sed 's/[^a-z0-9]//g')
> +;;
>  armv*)
>  cpuflags="-march=$cpu"
>  subarch=$(echo $cpu | sed 's/[^a-z0-9]//g')

looks good to me

Janne

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 2/3] arm: vp9mc: Load only 12 pixels in the 4 pixel wide horizontal filter

2024-12-19 Thread Janne Grunau

This reduces the amount the horizontal filters read beyond the filter
width to a consistent 1 pixel. The data is not used so this is usually
not noticeable. It becomes a problem when the application allocates
frame buffers only for the aligned picture size and the end of it is at
a page boundary. This happens for picture sizes which are a multiple of
the page size like 1280x640. The frame buffer allocation is based on
its most likely done via mmap + MAP_ANONYMOUS so start and end of the
buffer are page aligned and the previous and next page are not
necessarily mapped.
This mirrors the aarch64 change.

Signed-off-by: Janne Grunau 
---
 libavcodec/arm/vp9mc_neon.S | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/libavcodec/arm/vp9mc_neon.S b/libavcodec/arm/vp9mc_neon.S
index bd8cda7c308f..2ec729bb314d 100644
--- a/libavcodec/arm/vp9mc_neon.S
+++ b/libavcodec/arm/vp9mc_neon.S
@@ -279,11 +279,13 @@ function \type\()_8tap_\size\()h_\idx1\idx2
 sub r1,  r1,  r5
 .endif
 @ size >= 16 loads two qwords and increments r2,
-@ for size 4/8 it's enough with one qword and no
-@ postincrement
+@ size 4 loads 1 d word, increments r2 and loads 1 32-bit lane
+@ for size 8 it's enough with one qword and no postincrement
 .if \size >= 16
 sub r3,  r3,  r5
 sub r3,  r3,  #8
+.elseif \size == 4
+sub r3,  r3,  #8
 .endif
 @ Load the filter vector
 vld1.16 {q0},  [r12,:128]
@@ -295,9 +297,14 @@ function \type\()_8tap_\size\()h_\idx1\idx2
 .if \size >= 16
 vld1.8  {d18, d19, d20}, [r2]!
 vld1.8  {d24, d25, d26}, [r7]!
-.else
+.elseif \size == 8
 vld1.8  {q9},  [r2]
 vld1.8  {q12}, [r7]
+.else @ size == 4
+vld1.8  {d18}, [r2]!
+vld1.8  {d24}, [r7]!
+vld1.32 {d19[0]}, [r2]
+vld1.32 {d25[0]}, [r7]
 .endif
 vmovl.u8q8,  d18
 vmovl.u8q9,  d19
-- 
2.45.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 3/3] vp9: recon: Use emulated edge to prevent buffer overflows

2024-12-19 Thread Janne Grunau

The arm/aarch64 horizontal filter reads one additional pixel beyond what
the filter uses. This can become an issue if the application does not
allocate larger buffers than what's required for the pixel data. If the
motion vector points to the bottom right edge of the picture this
becomes a read buffer overflow. This triggers segfaults in Firefox for
video resolutions which result in a page aligned picture size like
1280x640.
Prevent this by using emulated edge in this case.

Fixes: https://bugzilla.mozilla.org/show_bug.cgi?id=1881185
Signed-off-by: Janne Grunau 
---
 libavcodec/vp9recon.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/libavcodec/vp9recon.c b/libavcodec/vp9recon.c
index ef08ed17c8e5..1f1b99a03c8a 100644
--- a/libavcodec/vp9recon.c
+++ b/libavcodec/vp9recon.c
@@ -319,7 +319,11 @@ static av_always_inline void mc_luma_unscaled(VP9TileData 
*td, const vp9_mc_func
 // The arm/aarch64 _hv filters read one more row than what actually is
 // needed, so switch to emulated edge one pixel sooner vertically
 // (!!my * 5) than horizontally (!!mx * 4).
+// The arm/aarch64 _h filters read one more pixel than what actually is
+// needed, so switch to emulated edge if that would read beyond the bottom
+// right block.
 if (x < !!mx * 3 || y < !!my * 3 ||
+((x + !!mx * 5 > w - bw) && (y + !!my * 5 + 1 > h - bh)) ||
 x + !!mx * 4 > w - bw || y + !!my * 5 > h - bh) {
 s->vdsp.emulated_edge_mc(td->edge_emu_buffer,
  ref - !!my * 3 * ref_stride - !!mx * 3 * 
bytesperpixel,
@@ -358,7 +362,11 @@ static av_always_inline void 
mc_chroma_unscaled(VP9TileData *td, const vp9_mc_fu
 // The arm/aarch64 _hv filters read one more row than what actually is
 // needed, so switch to emulated edge one pixel sooner vertically
 // (!!my * 5) than horizontally (!!mx * 4).
+// The arm/aarch64 _h filters read one more pixel than what actually is
+// needed, so switch to emulated edge if that would read beyond the bottom
+// right block.
 if (x < !!mx * 3 || y < !!my * 3 ||
+((x + !!mx * 5 > w - bw) && (y + !!my * 5 + 1 > h - bh)) ||
 x + !!mx * 4 > w - bw || y + !!my * 5 > h - bh) {
 s->vdsp.emulated_edge_mc(td->edge_emu_buffer,
  ref_u - !!my * 3 * src_stride_u - !!mx * 3 * 
bytesperpixel,
-- 
2.45.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 1/3] aarch64: vp9mc: Load only 12 pixels in the 4 pixel wide horizontal filter

2024-12-19 Thread Janne Grunau

This reduces the amount the horizontal filters read beyond the filter
width to a consistent 1 pixel. The data is not used so this is usually
not noticeable. It becomes a problem when the application allocates
frame buffers only for the aligned picture size and the end of it is at
a page boundary. This happens for picture sizes which are a multiple of
the page size like 1280x640. The frame buffer allocation is based on
its most likely done via mmap + MAP_ANONYMOUS so start and end of the
buffer are page aligned and the previous and next page are not
necessarily mapped.
Under these conditions like seen by Firefox a read beyond the end of the
buffer results in a segfault.
After the over-read is reduced to a single pixel it's reasonable to use
VP9's emulated edge motion compensation for this.

Fixes: https://bugzilla.mozilla.org/show_bug.cgi?id=1881185
Signed-off-by: Janne Grunau 
---
 libavcodec/aarch64/vp9mc_neon.S | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/libavcodec/aarch64/vp9mc_neon.S b/libavcodec/aarch64/vp9mc_neon.S
index abf2bae9db07..38f44ca56d0d 100644
--- a/libavcodec/aarch64/vp9mc_neon.S
+++ b/libavcodec/aarch64/vp9mc_neon.S
@@ -230,6 +230,9 @@ function \type\()_8tap_\size\()h_\idx1\idx2
 // reduced dst stride
 .if \size >= 16
 sub x1,  x1,  x5
+.elseif \size == 4
+add x12, x2,  #8
+add x13, x7,  #8
 .endif
 // size >= 16 loads two qwords and increments x2,
 // for size 4/8 it's enough with one qword and no
@@ -248,9 +251,14 @@ function \type\()_8tap_\size\()h_\idx1\idx2
 .if \size >= 16
 ld1 {v4.8b,  v5.8b,  v6.8b},  [x2], #24
 ld1 {v16.8b, v17.8b, v18.8b}, [x7], #24
-.else
+.elseif \size == 8
 ld1 {v4.8b,  v5.8b},  [x2]
 ld1 {v16.8b, v17.8b}, [x7]
+.else // \size == 4
+ld1 {v4.8b},  [x2]
+ld1 {v16.8b}, [x7]
+ld1 {v5.s}[0],  [x12], x3
+ld1 {v17.s}[0], [x13], x3
 .endif
 uxtlv4.8h,  v4.8b
 uxtlv5.8h,  v5.8b
-- 
2.45.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 2/3] arm: vp9mc: Load only 12 pixels in the 4 pixel wide horizontal filter

2024-12-20 Thread Janne Grunau

On Fri, Dec 20, 2024 at 07:53:33PM +0100, Michael Niedermayer wrote:
> Hi Janne
> 
> On Thu, Dec 19, 2024 at 10:12:22PM +0100, Janne Grunau wrote:
> > This reduces the amount the horizontal filters read beyond the filter
> > width to a consistent 1 pixel. The data is not used so this is usually
> > not noticeable. It becomes a problem when the application allocates
> > frame buffers only for the aligned picture size and the end of it is at
> > a page boundary. This happens for picture sizes which are a multiple of
> > the page size like 1280x640. The frame buffer allocation is based on
> > its most likely done via mmap + MAP_ANONYMOUS so start and end of the
> > buffer are page aligned and the previous and next page are not
> > necessarily mapped.
> > This mirrors the aarch64 change.
> > 
> > Signed-off-by: Janne Grunau 
> > ---
> >  libavcodec/arm/vp9mc_neon.S | 13 ++---
> >  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> tested on qemu-arm (assuming the standard fate and qemu covers this)

the asm is covered by checkasm and intentionally breaking the functions
was detected

Janne
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH v2 3/3] vp9: recon: Use emulated edge to prevent buffer overflows

2025-01-02 Thread Janne Grunau

The arm/aarch64 horizontal filter reads one additional pixel beyond what
the filter uses. This can become an issue if the application does not
allocate larger buffers than what's required for the pixel data. If the
motion vector points to the bottom right edge of the picture this
becomes a read buffer overflow. This triggers segfaults in Firefox for
video resolutions which result in a page aligned picture size like
1280x640.
Prevent this by using emulated edge in this case.

Fixes: https://bugzilla.mozilla.org/show_bug.cgi?id=1881185
Signed-off-by: Janne Grunau 
---
 libavcodec/vp9recon.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/libavcodec/vp9recon.c b/libavcodec/vp9recon.c
index ef08ed17c8e5..ccc49d771676 100644
--- a/libavcodec/vp9recon.c
+++ b/libavcodec/vp9recon.c
@@ -319,7 +319,11 @@ static av_always_inline void mc_luma_unscaled(VP9TileData 
*td, const vp9_mc_func
 // The arm/aarch64 _hv filters read one more row than what actually is
 // needed, so switch to emulated edge one pixel sooner vertically
 // (!!my * 5) than horizontally (!!mx * 4).
+// The arm/aarch64 _h filters read one more pixel than what actually is
+// needed, so switch to emulated edge if that would read beyond the bottom
+// right block.
 if (x < !!mx * 3 || y < !!my * 3 ||
+((ARCH_AARCH64 || ARCH_ARM) && (x + !!mx * 5 > w - bw) && (y + !!my * 
5 + 1 > h - bh)) ||
 x + !!mx * 4 > w - bw || y + !!my * 5 > h - bh) {
 s->vdsp.emulated_edge_mc(td->edge_emu_buffer,
  ref - !!my * 3 * ref_stride - !!mx * 3 * 
bytesperpixel,
@@ -358,7 +362,11 @@ static av_always_inline void 
mc_chroma_unscaled(VP9TileData *td, const vp9_mc_fu
 // The arm/aarch64 _hv filters read one more row than what actually is
 // needed, so switch to emulated edge one pixel sooner vertically
 // (!!my * 5) than horizontally (!!mx * 4).
+// The arm/aarch64 _h filters read one more pixel than what actually is
+// needed, so switch to emulated edge if that would read beyond the bottom
+// right block.
 if (x < !!mx * 3 || y < !!my * 3 ||
+((ARCH_AARCH64 || ARCH_ARM) && (x + !!mx * 5 > w - bw) && (y + !!my * 
5 + 1 > h - bh)) ||
 x + !!mx * 4 > w - bw || y + !!my * 5 > h - bh) {
 s->vdsp.emulated_edge_mc(td->edge_emu_buffer,
  ref_u - !!my * 3 * src_stride_u - !!mx * 3 * 
bytesperpixel,
-- 
2.45.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH v2 1/3] aarch64: vp9mc: Load only 12 pixels in the 4 pixel wide horizontal filter

2025-01-02 Thread Janne Grunau

This reduces the amount the horizontal filters read beyond the filter
width to a consistent 1 pixel. The data is not used so this is usually
not noticeable. It becomes a problem when the application allocates
frame buffers only for the aligned picture size and the end of it is at
a page boundary. This happens for picture sizes which are a multiple of
the page size like 1280x640. The frame buffer allocation is based on
its most likely done via mmap + MAP_ANONYMOUS so start and end of the
buffer are page aligned and the previous and next page are not
necessarily mapped.
Under these conditions like seen by Firefox a read beyond the end of the
buffer results in a segfault.
After the over-read is reduced to a single pixel it's reasonable to use
VP9's emulated edge motion compensation for this.

Fixes: https://bugzilla.mozilla.org/show_bug.cgi?id=1881185
Signed-off-by: Janne Grunau 
---
 libavcodec/aarch64/vp9mc_neon.S | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/libavcodec/aarch64/vp9mc_neon.S b/libavcodec/aarch64/vp9mc_neon.S
index abf2bae9db07..38f44ca56d0d 100644
--- a/libavcodec/aarch64/vp9mc_neon.S
+++ b/libavcodec/aarch64/vp9mc_neon.S
@@ -230,6 +230,9 @@ function \type\()_8tap_\size\()h_\idx1\idx2
 // reduced dst stride
 .if \size >= 16
 sub x1,  x1,  x5
+.elseif \size == 4
+add x12, x2,  #8
+add x13, x7,  #8
 .endif
 // size >= 16 loads two qwords and increments x2,
 // for size 4/8 it's enough with one qword and no
@@ -248,9 +251,14 @@ function \type\()_8tap_\size\()h_\idx1\idx2
 .if \size >= 16
 ld1 {v4.8b,  v5.8b,  v6.8b},  [x2], #24
 ld1 {v16.8b, v17.8b, v18.8b}, [x7], #24
-.else
+.elseif \size == 8
 ld1 {v4.8b,  v5.8b},  [x2]
 ld1 {v16.8b, v17.8b}, [x7]
+.else // \size == 4
+ld1 {v4.8b},  [x2]
+ld1 {v16.8b}, [x7]
+ld1 {v5.s}[0],  [x12], x3
+ld1 {v17.s}[0], [x13], x3
 .endif
 uxtlv4.8h,  v4.8b
 uxtlv5.8h,  v5.8b
-- 
2.45.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH v2 2/3] arm: vp9mc: Load only 12 pixels in the 4 pixel wide horizontal filter

2025-01-02 Thread Janne Grunau

This reduces the amount the horizontal filters read beyond the filter
width to a consistent 1 pixel. The data is not used so this is usually
not noticeable. It becomes a problem when the application allocates
frame buffers only for the aligned picture size and the end of it is at
a page boundary. This happens for picture sizes which are a multiple of
the page size like 1280x640. The frame buffer allocation is based on
its most likely done via mmap + MAP_ANONYMOUS so start and end of the
buffer are page aligned and the previous and next page are not
necessarily mapped.
This mirrors the aarch64 change.

Signed-off-by: Janne Grunau 
---
 libavcodec/arm/vp9mc_neon.S | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/libavcodec/arm/vp9mc_neon.S b/libavcodec/arm/vp9mc_neon.S
index bd8cda7c308f..2ec729bb314d 100644
--- a/libavcodec/arm/vp9mc_neon.S
+++ b/libavcodec/arm/vp9mc_neon.S
@@ -279,11 +279,13 @@ function \type\()_8tap_\size\()h_\idx1\idx2
 sub r1,  r1,  r5
 .endif
 @ size >= 16 loads two qwords and increments r2,
-@ for size 4/8 it's enough with one qword and no
-@ postincrement
+@ size 4 loads 1 d word, increments r2 and loads 1 32-bit lane
+@ for size 8 it's enough with one qword and no postincrement
 .if \size >= 16
 sub r3,  r3,  r5
 sub r3,  r3,  #8
+.elseif \size == 4
+sub r3,  r3,  #8
 .endif
 @ Load the filter vector
 vld1.16 {q0},  [r12,:128]
@@ -295,9 +297,14 @@ function \type\()_8tap_\size\()h_\idx1\idx2
 .if \size >= 16
 vld1.8  {d18, d19, d20}, [r2]!
 vld1.8  {d24, d25, d26}, [r7]!
-.else
+.elseif \size == 8
 vld1.8  {q9},  [r2]
 vld1.8  {q12}, [r7]
+.else @ size == 4
+vld1.8  {d18}, [r2]!
+vld1.8  {d24}, [r7]!
+vld1.32 {d19[0]}, [r2]
+vld1.32 {d25[0]}, [r7]
 .endif
 vmovl.u8q8,  d18
 vmovl.u8q9,  d19
-- 
2.45.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 3/3] vp9: recon: Use emulated edge to prevent buffer overflows

2024-12-21 Thread Janne Grunau

On Fri, Dec 20, 2024 at 07:51:08PM +0100, Michael Niedermayer wrote:
> Hi Janne
> 
> On Thu, Dec 19, 2024 at 10:12:23PM +0100, Janne Grunau wrote:
> > The arm/aarch64 horizontal filter reads one additional pixel beyond what
> > the filter uses. This can become an issue if the application does not
> > allocate larger buffers than what's required for the pixel data. If the
> > motion vector points to the bottom right edge of the picture this
> > becomes a read buffer overflow. This triggers segfaults in Firefox for
> > video resolutions which result in a page aligned picture size like
> > 1280x640.
> > Prevent this by using emulated edge in this case.
> > 
> > Fixes: https://bugzilla.mozilla.org/show_bug.cgi?id=1881185
> > Signed-off-by: Janne Grunau 
> > ---
> >  libavcodec/vp9recon.c | 8 
> >  1 file changed, 8 insertions(+)
> 
> patch LGTM
> maybe could have ARCH_... && added to avoid evaluating the extra condition
> when not needed

I didn't do that since I assumed that other asm might do that as well.
I checked the x86 asm which doesn't appear to the same issue.

I'll send an updated patch.

Janne

resend due to wrong From: address, if a ML admin reads please kill the
duplicate mail in the moderation queue (I've get no  notification that
the mail is held though).
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 1/1] arm64: fix inverted register order in transpose_4x4H

Re: [FFmpeg-devel] [libav-commits] checkasm: add fmtconvert tests

Re: [FFmpeg-devel] [libav-devel] [PATCH 1/2] avcodec: Add Cineform HD Decoder

Re: [FFmpeg-devel] [PATCH] configure: arm: Don't add -march= to the compiler if no preference was passed

[FFmpeg-devel] [PATCH 2/3] arm: vp9mc: Load only 12 pixels in the 4 pixel wide horizontal filter

[FFmpeg-devel] [PATCH 3/3] vp9: recon: Use emulated edge to prevent buffer overflows

[FFmpeg-devel] [PATCH 1/3] aarch64: vp9mc: Load only 12 pixels in the 4 pixel wide horizontal filter

Re: [FFmpeg-devel] [PATCH 2/3] arm: vp9mc: Load only 12 pixels in the 4 pixel wide horizontal filter

[FFmpeg-devel] [PATCH v2 3/3] vp9: recon: Use emulated edge to prevent buffer overflows

[FFmpeg-devel] [PATCH v2 1/3] aarch64: vp9mc: Load only 12 pixels in the 4 pixel wide horizontal filter

[FFmpeg-devel] [PATCH v2 2/3] arm: vp9mc: Load only 12 pixels in the 4 pixel wide horizontal filter

Re: [FFmpeg-devel] [PATCH 3/3] vp9: recon: Use emulated edge to prevent buffer overflows

12 matches

Site Navigation

Mail list logo

Footer information