On Thu, 7 Dec 2023 at 18:03, Martin Storsjö wrote:
>
> Hi,
>
> On Mon, 4 Dec 2023, Mikhail Nitenko wrote:
>
> > ---
>
> I think the patch subject is missing to tell that this adds 10 bit
> functions?
>
> > I remodeled the patch (as Martin once sug
On Thu, 7 Dec 2023 at 18:03, Martin Storsjö wrote:
>
> Hi,
>
> On Mon, 4 Dec 2023, Mikhail Nitenko wrote:
>
> > ---
>
> I think the patch subject is missing to tell that this adds 10 bit
> functions?
Yes, you are right. Will you add it when you push it? Or shoul
: 1850.7 1813.2 893.0 545.2
put_h264_qpel_16_mc33_10_c: 8688.7 8671.2 5223.2 3512.0
put_h264_qpel_16_mc33_10_neon: 1851.7 1814.2 908.5 535.2
Signed-off-by: Mikhail Nitenko
---
I remodeled the patch (as Martin once suggested), it doesn't
go to 32bits in lowpass_8_10 a
h264_idct_add16intra_10bpp_c: 784.7 439.5
h264_idct_add16intra_10bpp_neon: 641.0 462.2
Signed-off-by: Mikhail Nitenko
---
there is a function that is not covered by tests, but I tested it with
sample videos, not sure what to do with it
libavcodec
transpose_4x4S and transpose_8x8S were declared in vp9itxfm_16bpp_neon, however
these macros are
not unique to vp9 and could be used elsewhere.
Signed-off-by: Mikhail Nitenko
---
libavcodec/aarch64/neon.S| 49
libavcodec/aarch64/vp9itxfm_16bpp_neon.S
h264_v_loop_filter_chroma_intra_10bpp_c: 158.070.7
h264_v_loop_filter_chroma_intra_10bpp_neon: 48.731.5
Signed-off-by: Mikhail Nitenko
---
fixed alignment, moved adds in h264_loop_filter_chroma_10 one
instruction back, moved smin and smax together
libavcodec/aarch64/h264dsp_init_aarch64
transpose_4x8H was declared in vp9lpf_16bpp_neon, however this macro is
not unique to vp9 and could be used elsewhere.
Signed-off-by: Mikhail Nitenko
---
libavcodec/aarch64/neon.S | 13 +
libavcodec/aarch64/vp9lpf_16bpp_neon.S | 12
2 files changed, 13
Signed-off-by: Mikhail Nitenko
---
fixed alignment, doing subs before st1 is only a bit faster for A53 and
slower for A72 so I did not add it to this patch.
libavcodec/aarch64/h264pred_init.c | 40 +++-
libavcodec/aarch64/h264pred_neon.S | 304 -
2 files changed
-by: Mikhail Nitenko
---
libavcodec/aarch64/h264qpel_init_aarch64.c | 91 +++-
libavcodec/aarch64/h264qpel_neon.S | 515 +
2 files changed, 604 insertions(+), 2 deletions(-)
diff --git a/libavcodec/aarch64/h264qpel_init_aarch64.c
b/libavcodec/aarch64
Signed-off-by: Mikhail Nitenko
---
moved to 32-bit, however, in plane the 16bit are not enough, and it
overflows, so when it overflows the code starts using 32bit wide
sections
libavcodec/aarch64/h264pred_init.c | 40 +++-
libavcodec/aarch64/h264pred_neon.S | 302
h264_v_loop_filter_chroma_intra_10bpp_c: 158.070.7
h264_v_loop_filter_chroma_intra_10bpp_neon: 48.732.0
Signed-off-by: Mikhail Nitenko
---
removed leftover code, moved from 32bit and started loading with two
alternating registers, code became quite a bit faster!
libavcodec/aarch64
transpose_4x8H was declared in vp9lpf_16bpp_neon, however this macro is
not unique to vp9 and could be used elsewhere.
Signed-off-by: Mikhail Nitenko
---
libavcodec/aarch64/neon.S | 13 +
libavcodec/aarch64/vp9lpf_16bpp_neon.S | 12
2 files changed, 13
: 257.2 138.5
h264_v_loop_filter_chroma_10bpp_neon: 98.2 67.5
h264_v_loop_filter_chroma_intra_10bpp_c: 158.0 76.2
h264_v_loop_filter_chroma_intra_10bpp_neon: 62.7 36.5
Signed-off-by: Mikhail Nitenko
---
this code is a bit slow
transpose_4x8H was declared in vp9lpf_16bpp_neon, however this macro is
not unique to vp9 and could be used elsewhere.
Signed-off-by: Mikhail Nitenko
---
libavcodec/aarch64/neon.S | 13 +
libavcodec/aarch64/vp9lpf_16bpp_neon.S | 12
2 files changed, 13
Signed-off-by: Mikhail Nitenko
---
libavcodec/aarch64/h264pred_init.c | 40 +++-
libavcodec/aarch64/h264pred_neon.S | 369 -
2 files changed, 402 insertions(+), 7 deletions(-)
diff --git a/libavcodec/aarch64/h264pred_init.c
b/libavcodec/aarch64/h264pred_init.c
Benchmarks: A53 A72
pred16x16_dc_10_c: 136.0 124.0
pred16x16_dc_10_neon: 121.2 106.0
pred16x16_horizontal_10_c: 155.073.2
pred16x16_horizontal_10_neon:82.267.7
pred16x16_top_dc_10_c: 106.093.7
pred16x16_top_dc_10_neon
Change structure to allow the addition of other bit depths.
---
libavcodec/aarch64/h264pred_init.c | 57 ++
1 file changed, 27 insertions(+), 30 deletions(-)
diff --git a/libavcodec/aarch64/h264pred_init.c
b/libavcodec/aarch64/h264pred_init.c
index b144376f90..fc8989a
will change code to use one addv instruction
ср, 14 апр. 2021 г. в 16:18, chen :
>
> Inlined a few comments for ff_pred16x16_top_dc_neon_10, other are similar.
>
> At 2021-04-14 20:35:44, "Martin Storsjö" wrote:
> >On Tue, 13 Apr 2021, Mikhail Nitenko wrote:
> >
Benchmarks:
pred16x16_dc_10_c: 124.0
pred16x16_dc_10_neon: 97.2
pred16x16_horizontal_10_c: 71.7
pred16x16_horizontal_10_neon: 66.2
pred16x16_top_dc_10_c: 90.7
pred16x16_top_dc_10_neon: 71.5
pred16x16_vertical_10_c: 64.7
pred16x16_vertical_10_neon: 61.7
Some functions work slower than C and are lef
Change structure to allow addition of other bit depths.
---
libavcodec/aarch64/h264pred_init.c | 57 ++
1 file changed, 27 insertions(+), 30 deletions(-)
diff --git a/libavcodec/aarch64/h264pred_init.c
b/libavcodec/aarch64/h264pred_init.c
index b144376f90..fc8989ae0d
left commented out.
Signed-off-by: Mikhail Nitenko
---
libavcodec/aarch64/h264pred_init.c | 68 +
libavcodec/aarch64/h264pred_neon.S | 117 +
2 files changed, 155 insertions(+), 30 deletions(-)
diff --git a/libavcodec/aarch64/h264pred_init.c
b
here are the benchmarks https://0x1.st/kX.txt
---
libavcodec/aarch64/h264pred_init.c | 75 +++---
libavcodec/aarch64/h264pred_neon.S | 123 +
2 files changed, 168 insertions(+), 30 deletions(-)
diff --git a/libavcodec/aarch64/h264pred_init.c
b/libavcodec/
22 matches
Mail list logo