Re: [FFmpeg-devel] [PATCH] lavc/aarch64: h264qpel, add lowpass_8 based functions

2023-12-07 Thread Mikhail Nitenko
On Thu, 7 Dec 2023 at 18:03, Martin Storsjö wrote: > > Hi, > > On Mon, 4 Dec 2023, Mikhail Nitenko wrote: > > > --- > > I think the patch subject is missing to tell that this adds 10 bit > functions? > > > I remodeled the patch (as Martin once sug

Re: [FFmpeg-devel] [PATCH] lavc/aarch64: h264qpel, add lowpass_8 based functions

2023-12-07 Thread Mikhail Nitenko
On Thu, 7 Dec 2023 at 18:03, Martin Storsjö wrote: > > Hi, > > On Mon, 4 Dec 2023, Mikhail Nitenko wrote: > > > --- > > I think the patch subject is missing to tell that this adds 10 bit > functions? Yes, you are right. Will you add it when you push it? Or shoul

[FFmpeg-devel] [PATCH] lavc/aarch64: h264qpel, add lowpass_8 based functions

2023-12-04 Thread Mikhail Nitenko
: 1850.7 1813.2 893.0 545.2 put_h264_qpel_16_mc33_10_c: 8688.7 8671.2 5223.2 3512.0 put_h264_qpel_16_mc33_10_neon: 1851.7 1814.2 908.5 535.2 Signed-off-by: Mikhail Nitenko --- I remodeled the patch (as Martin once suggested), it doesn't go to 32bits in lowpass_8_10 a

[FFmpeg-devel] [PATCH 2/2] lavc/aarch64: h264, add idct for 10bit

2021-08-19 Thread Mikhail Nitenko
h264_idct_add16intra_10bpp_c: 784.7 439.5 h264_idct_add16intra_10bpp_neon: 641.0 462.2 Signed-off-by: Mikhail Nitenko --- there is a function that is not covered by tests, but I tested it with sample videos, not sure what to do with it libavcodec

[FFmpeg-devel] [PATCH 1/2] lavc/aarch64: move transpose_4x4S and transpose_8x8S to neon.S

2021-08-19 Thread Mikhail Nitenko
transpose_4x4S and transpose_8x8S were declared in vp9itxfm_16bpp_neon, however these macros are not unique to vp9 and could be used elsewhere. Signed-off-by: Mikhail Nitenko --- libavcodec/aarch64/neon.S| 49 libavcodec/aarch64/vp9itxfm_16bpp_neon.S

[FFmpeg-devel] [PATCH v3 2/2] lavc/aarch64: h264, add chroma loop filters for 10bit

2021-08-19 Thread Mikhail Nitenko
h264_v_loop_filter_chroma_intra_10bpp_c: 158.070.7 h264_v_loop_filter_chroma_intra_10bpp_neon: 48.731.5 Signed-off-by: Mikhail Nitenko --- fixed alignment, moved adds in h264_loop_filter_chroma_10 one instruction back, moved smin and smax together libavcodec/aarch64/h264dsp_init_aarch64

[FFmpeg-devel] [PATCH 1/2] lavc/aarch64: move transpose_4x8H to neon.S

2021-08-19 Thread Mikhail Nitenko
transpose_4x8H was declared in vp9lpf_16bpp_neon, however this macro is not unique to vp9 and could be used elsewhere. Signed-off-by: Mikhail Nitenko --- libavcodec/aarch64/neon.S | 13 + libavcodec/aarch64/vp9lpf_16bpp_neon.S | 12 2 files changed, 13

[FFmpeg-devel] [PATCH v3] lavc/aarch64: add pred functions for 10-bit

2021-08-19 Thread Mikhail Nitenko
Signed-off-by: Mikhail Nitenko --- fixed alignment, doing subs before st1 is only a bit faster for A53 and slower for A72 so I did not add it to this patch. libavcodec/aarch64/h264pred_init.c | 40 +++- libavcodec/aarch64/h264pred_neon.S | 304 - 2 files changed

[FFmpeg-devel] [PATCH] lavc/aarch64: h264qpel, add lowpass_8 based functions

2021-08-19 Thread Mikhail Nitenko
-by: Mikhail Nitenko --- libavcodec/aarch64/h264qpel_init_aarch64.c | 91 +++- libavcodec/aarch64/h264qpel_neon.S | 515 + 2 files changed, 604 insertions(+), 2 deletions(-) diff --git a/libavcodec/aarch64/h264qpel_init_aarch64.c b/libavcodec/aarch64

[FFmpeg-devel] [PATCH v2] lavc/aarch64: add pred functions for 10-bit

2021-08-16 Thread Mikhail Nitenko
Signed-off-by: Mikhail Nitenko --- moved to 32-bit, however, in plane the 16bit are not enough, and it overflows, so when it overflows the code starts using 32bit wide sections libavcodec/aarch64/h264pred_init.c | 40 +++- libavcodec/aarch64/h264pred_neon.S | 302

[FFmpeg-devel] [PATCH v2 2/2] lavc/aarch64: h264, add chroma loop filters for 10bit

2021-08-16 Thread Mikhail Nitenko
h264_v_loop_filter_chroma_intra_10bpp_c: 158.070.7 h264_v_loop_filter_chroma_intra_10bpp_neon: 48.732.0 Signed-off-by: Mikhail Nitenko --- removed leftover code, moved from 32bit and started loading with two alternating registers, code became quite a bit faster! libavcodec/aarch64

[FFmpeg-devel] [PATCH 1/2] lavc/aarch64: move transpose_4x8H to neon.S

2021-08-16 Thread Mikhail Nitenko
transpose_4x8H was declared in vp9lpf_16bpp_neon, however this macro is not unique to vp9 and could be used elsewhere. Signed-off-by: Mikhail Nitenko --- libavcodec/aarch64/neon.S | 13 + libavcodec/aarch64/vp9lpf_16bpp_neon.S | 12 2 files changed, 13

[FFmpeg-devel] [PATCH 2/2] lavc/aarch64: h264, add chroma loop filters for 10bit

2021-07-16 Thread Mikhail Nitenko
: 257.2 138.5 h264_v_loop_filter_chroma_10bpp_neon: 98.2 67.5 h264_v_loop_filter_chroma_intra_10bpp_c: 158.0 76.2 h264_v_loop_filter_chroma_intra_10bpp_neon: 62.7 36.5 Signed-off-by: Mikhail Nitenko --- this code is a bit slow

[FFmpeg-devel] [PATCH 1/2] lavc/aarch64: move transpose_4x8H to neon.S

2021-07-16 Thread Mikhail Nitenko
transpose_4x8H was declared in vp9lpf_16bpp_neon, however this macro is not unique to vp9 and could be used elsewhere. Signed-off-by: Mikhail Nitenko --- libavcodec/aarch64/neon.S | 13 + libavcodec/aarch64/vp9lpf_16bpp_neon.S | 12 2 files changed, 13

[FFmpeg-devel] [PATCH] lavc/aarch64: add pred functions for 10-bit

2021-07-16 Thread Mikhail Nitenko
Signed-off-by: Mikhail Nitenko --- libavcodec/aarch64/h264pred_init.c | 40 +++- libavcodec/aarch64/h264pred_neon.S | 369 - 2 files changed, 402 insertions(+), 7 deletions(-) diff --git a/libavcodec/aarch64/h264pred_init.c b/libavcodec/aarch64/h264pred_init.c

[FFmpeg-devel] [PATCH v4 2/2] lavc/aarch64: add pred16x16 10-bit functions

2021-04-15 Thread Mikhail Nitenko
Benchmarks: A53 A72 pred16x16_dc_10_c: 136.0 124.0 pred16x16_dc_10_neon: 121.2 106.0 pred16x16_horizontal_10_c: 155.073.2 pred16x16_horizontal_10_neon:82.267.7 pred16x16_top_dc_10_c: 106.093.7 pred16x16_top_dc_10_neon

[FFmpeg-devel] [PATCH v4 1/2] lavc/aarch64: change h264pred_init structure

2021-04-15 Thread Mikhail Nitenko
Change structure to allow the addition of other bit depths. --- libavcodec/aarch64/h264pred_init.c | 57 ++ 1 file changed, 27 insertions(+), 30 deletions(-) diff --git a/libavcodec/aarch64/h264pred_init.c b/libavcodec/aarch64/h264pred_init.c index b144376f90..fc8989a

Re: [FFmpeg-devel] [PATCH v3 2/2] lavc/aarch64: add pred16x16 10-bit functions

2021-04-14 Thread Mikhail Nitenko
will change code to use one addv instruction ср, 14 апр. 2021 г. в 16:18, chen : > > Inlined a few comments for ff_pred16x16_top_dc_neon_10, other are similar. > > At 2021-04-14 20:35:44, "Martin Storsjö" wrote: > >On Tue, 13 Apr 2021, Mikhail Nitenko wrote: > >

[FFmpeg-devel] [PATCH v3 2/2] lavc/aarch64: add pred16x16 10-bit functions

2021-04-13 Thread Mikhail Nitenko
Benchmarks: pred16x16_dc_10_c: 124.0 pred16x16_dc_10_neon: 97.2 pred16x16_horizontal_10_c: 71.7 pred16x16_horizontal_10_neon: 66.2 pred16x16_top_dc_10_c: 90.7 pred16x16_top_dc_10_neon: 71.5 pred16x16_vertical_10_c: 64.7 pred16x16_vertical_10_neon: 61.7 Some functions work slower than C and are lef

[FFmpeg-devel] [PATCH v3 1/2] lavc/aarch64: change h264pred_init structure

2021-04-13 Thread Mikhail Nitenko
Change structure to allow addition of other bit depths. --- libavcodec/aarch64/h264pred_init.c | 57 ++ 1 file changed, 27 insertions(+), 30 deletions(-) diff --git a/libavcodec/aarch64/h264pred_init.c b/libavcodec/aarch64/h264pred_init.c index b144376f90..fc8989ae0d

[FFmpeg-devel] [PATCH v2] lavc/aarch64: add pred16x16 10-bit functions

2021-04-12 Thread Mikhail Nitenko
left commented out. Signed-off-by: Mikhail Nitenko --- libavcodec/aarch64/h264pred_init.c | 68 + libavcodec/aarch64/h264pred_neon.S | 117 + 2 files changed, 155 insertions(+), 30 deletions(-) diff --git a/libavcodec/aarch64/h264pred_init.c b

[FFmpeg-devel] [PATCH] lavc/aarch64: add pred16x16 10-bit functions

2021-04-08 Thread Mikhail Nitenko
here are the benchmarks https://0x1.st/kX.txt --- libavcodec/aarch64/h264pred_init.c | 75 +++--- libavcodec/aarch64/h264pred_neon.S | 123 + 2 files changed, 168 insertions(+), 30 deletions(-) diff --git a/libavcodec/aarch64/h264pred_init.c b/libavcodec/