[FFmpeg-devel] [PATCH v1 1/6] avcodec/hevc: Add init for sao_edge_filter

2023-12-22 Thread jinbo
Forgot to init c->sao_edge_filter[idx] when idx=0/1/2/3. After this patch, the speedup of decoding H265 4K 30FPS 30Mbps on 3A6000 is about 7% (42fps==>45fps). Change-Id: I521999b397fa72b931a23c165cf45f276440cdfb --- libavcodec/loongarch/hevcdsp_init_loongarch.c | 4 1 file changed, 4 inserti

[FFmpeg-devel] [PATCH v1] [loongarch] Add hevc 128-bit & 256-bit asm optimizations

2023-12-22 Thread jinbo
Hello, everyone! The hevc asm optimizatons are submitted, here is a brief introduction. After the 6 patches, the speedup of decoding H265 4K 30FPS 30Mbps on 3A6000 with 8 threads is about 33%(42fps-->56fps). Reviews are welcome, thanks for in advance. [PATCH v1 1/6] avcodec/hevc: Add init for sao

[FFmpeg-devel] [PATCH v1 2/6] avcodec/hevc: Add add_residual_4/8/16/32 asm opt

2023-12-22 Thread jinbo
000..dd2d820af8 --- /dev/null +++ b/libavcodec/loongarch/hevc_add_res.S @@ -0,0 +1,162 @@ +/* + * Loongson LSX optimized add_residual functions for HEVC decoding + * + * Copyright (c) 2023 Loongson Technology Corporation Limited + * Contributed by jinbo + * + * This file is part of FFmpeg. + * + * FFm

[FFmpeg-devel] [PATCH v1 4/6] avcodec/hevc: Add qpel_uni_w_v|h4/6/8/12/16/24/32/48/64 asm opt

2023-12-22 Thread jinbo
tests/checkasm/checkasm: C LSX LASX put_hevc_qpel_uni_w_h4_8_c:6.5 1.7 1.2 put_hevc_qpel_uni_w_h6_8_c:14.54.5 3.7 put_hevc_qpel_uni_w_h8_8_c:24.55.7 4.5 put_hevc_qpel_uni_w_h12_8_c: 54.717.512.0 put_hevc_qpel_uni_w_h1

[FFmpeg-devel] [PATCH v1 3/6] avcodec/hevc: Add pel_uni_w_pixels4/6/8/12/16/24/32/48/64 asm opt

2023-12-22 Thread jinbo
000..c5d553effe --- /dev/null +++ b/libavcodec/loongarch/hevc_mc.S @@ -0,0 +1,471 @@ +/* + * Copyright (c) 2023 Loongson Technology Corporation Limited + * Contributed by jinbo + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under

[FFmpeg-devel] [PATCH v1 5/6] avcodec/hevc: Add epel_uni_w_hv4/6/8/12/16/24/32/48/64 asm opt

2023-12-22 Thread jinbo
tests/checkasm/checkasm: C LSX LASX put_hevc_epel_uni_w_hv4_8_c: 9.5 2.2 put_hevc_epel_uni_w_hv6_8_c: 18.55.0 3.7 put_hevc_epel_uni_w_hv8_8_c: 30.76.0 4.5 put_hevc_epel_uni_w_hv12_8_c: 63.714.010.7 put_hevc_epel_uni_w_hv16_8_c:

[FFmpeg-devel] [PATCH v1 6/6] avcodec/hevc: Add asm opt for the following functions

2023-12-22 Thread jinbo
tests/checkasm/checkasm: C LSX LASX put_hevc_qpel_uni_h4_8_c: 5.7 1.2 put_hevc_qpel_uni_h6_8_c: 12.22.7 put_hevc_qpel_uni_h8_8_c: 21.53.2 put_hevc_qpel_uni_h12_8_c: 47.29.2 7.2 put_hevc_qpel_uni_h16_8_c: 87.011.7

[FFmpeg-devel] [PATCH v2 1/7] avcodec/hevc: Add init for sao_edge_filter

2023-12-26 Thread jinbo
Forgot to init c->sao_edge_filter[idx] when idx=0/1/2/3. After this patch, the speedup of decoding H265 4K 30FPS 30Mbps on 3A6000 is about 7% (42fps==>45fps). Change-Id: I521999b397fa72b931a23c165cf45f276440cdfb --- libavcodec/loongarch/hevcdsp_init_loongarch.c | 4 1 file changed, 4 inserti

[FFmpeg-devel] [PATCH v2] [loongarch] Add hevc 128-bit & 256-bit asm optimizatons

2023-12-26 Thread jinbo
v2: Add patch 7/7. [PATCH v2 1/7] avcodec/hevc: Add init for sao_edge_filter [PATCH v2 2/7] avcodec/hevc: Add add_residual_4/8/16/32 asm opt [PATCH v2 3/7] avcodec/hevc: Add pel_uni_w_pixels4/6/8/12/16/24/32/48/64 asm opt [PATCH v2 4/7] avcodec/hevc: Add qpel_uni_w_v|h4/6/8/12/16/24/32/48/64 asm o

[FFmpeg-devel] [PATCH v2 3/7] avcodec/hevc: Add pel_uni_w_pixels4/6/8/12/16/24/32/48/64 asm opt

2023-12-26 Thread jinbo
000..c5d553effe --- /dev/null +++ b/libavcodec/loongarch/hevc_mc.S @@ -0,0 +1,471 @@ +/* + * Copyright (c) 2023 Loongson Technology Corporation Limited + * Contributed by jinbo + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under

[FFmpeg-devel] [PATCH v2 2/7] avcodec/hevc: Add add_residual_4/8/16/32 asm opt

2023-12-26 Thread jinbo
000..dd2d820af8 --- /dev/null +++ b/libavcodec/loongarch/hevc_add_res.S @@ -0,0 +1,162 @@ +/* + * Loongson LSX optimized add_residual functions for HEVC decoding + * + * Copyright (c) 2023 Loongson Technology Corporation Limited + * Contributed by jinbo + * + * This file is part of FFmpeg. + * + * FFm

[FFmpeg-devel] [PATCH v2 4/7] avcodec/hevc: Add qpel_uni_w_v|h4/6/8/12/16/24/32/48/64 asm opt

2023-12-26 Thread jinbo
tests/checkasm/checkasm: C LSX LASX put_hevc_qpel_uni_w_h4_8_c:6.5 1.7 1.2 put_hevc_qpel_uni_w_h6_8_c:14.54.5 3.7 put_hevc_qpel_uni_w_h8_8_c:24.55.7 4.5 put_hevc_qpel_uni_w_h12_8_c: 54.717.512.0 put_hevc_qpel_uni_w_h1

[FFmpeg-devel] [PATCH v2 5/7] avcodec/hevc: Add epel_uni_w_hv4/6/8/12/16/24/32/48/64 asm opt

2023-12-26 Thread jinbo
tests/checkasm/checkasm: C LSX LASX put_hevc_epel_uni_w_hv4_8_c: 9.5 2.2 put_hevc_epel_uni_w_hv6_8_c: 18.55.0 3.7 put_hevc_epel_uni_w_hv8_8_c: 30.76.0 4.5 put_hevc_epel_uni_w_hv12_8_c: 63.714.010.7 put_hevc_epel_uni_w_hv16_8_c:

[FFmpeg-devel] [PATCH v2 7/7] avcodec/hevc: Add ff_hevc_idct_32x32_lasx asm opt

2023-12-26 Thread jinbo
From: yuanhecai tests/checkasm/checkasm: C LSX LASX hevc_idct_32x32_8_c: 1243.0 211.7 101.7 Speedup of decoding H265 4K 30FPS 30Mbps on 3A6000 with 8 threads is 1fps(56fps-->57fps). --- libavcodec/loongarch/Makefile | 3 +-

[FFmpeg-devel] [PATCH v2 6/7] avcodec/hevc: Add asm opt for the following functions

2023-12-26 Thread jinbo
tests/checkasm/checkasm: C LSX LASX put_hevc_qpel_uni_h4_8_c: 5.7 1.2 put_hevc_qpel_uni_h6_8_c: 12.22.7 put_hevc_qpel_uni_h8_8_c: 21.53.2 put_hevc_qpel_uni_h12_8_c: 47.29.2 7.2 put_hevc_qpel_uni_h16_8_c: 87.011.7

[FFmpeg-devel] [PATCH v3 4/7] avcodec/hevc: Add qpel_uni_w_v|h4/6/8/12/16/24/32/48/64 asm opt

2023-12-28 Thread jinbo
tests/checkasm/checkasm: C LSX LASX put_hevc_qpel_uni_w_h4_8_c:6.5 1.7 1.2 put_hevc_qpel_uni_w_h6_8_c:14.54.5 3.7 put_hevc_qpel_uni_w_h8_8_c:24.55.7 4.5 put_hevc_qpel_uni_w_h12_8_c: 54.717.512.0 put_hevc_qpel_uni_w_h1

[FFmpeg-devel] [PATCH v3 5/7] avcodec/hevc: Add epel_uni_w_hv4/6/8/12/16/24/32/48/64 asm opt

2023-12-28 Thread jinbo
tests/checkasm/checkasm: C LSX LASX put_hevc_epel_uni_w_hv4_8_c: 9.5 2.2 put_hevc_epel_uni_w_hv6_8_c: 18.55.0 3.7 put_hevc_epel_uni_w_hv8_8_c: 30.76.0 4.5 put_hevc_epel_uni_w_hv12_8_c: 63.714.010.7 put_hevc_epel_uni_w_hv16_8_c:

[FFmpeg-devel] [PATCH v3 6/7] avcodec/hevc: Add asm opt for the following functions

2023-12-28 Thread jinbo
tests/checkasm/checkasm: C LSX LASX put_hevc_qpel_uni_h4_8_c: 5.7 1.2 put_hevc_qpel_uni_h6_8_c: 12.22.7 put_hevc_qpel_uni_h8_8_c: 21.53.2 put_hevc_qpel_uni_h12_8_c: 47.29.2 7.2 put_hevc_qpel_uni_h16_8_c: 87.011.7

[FFmpeg-devel] [PATCH v3 7/7] avcodec/hevc: Add ff_hevc_idct_32x32_lasx asm opt

2023-12-28 Thread jinbo
From: yuanhecai tests/checkasm/checkasm: C LSX LASX hevc_idct_32x32_8_c: 1243.0 211.7 101.7 Speedup of decoding H265 4K 30FPS 30Mbps on 3A6000 with 8 threads is 1fps(56fps-->57fps). --- libavcodec/loongarch/Makefile | 3 +-

[FFmpeg-devel] [PATCH v3 2/7] avcodec/hevc: Add add_residual_4/8/16/32 asm opt

2023-12-28 Thread jinbo
000..dd2d820af8 --- /dev/null +++ b/libavcodec/loongarch/hevc_add_res.S @@ -0,0 +1,162 @@ +/* + * Loongson LSX optimized add_residual functions for HEVC decoding + * + * Copyright (c) 2023 Loongson Technology Corporation Limited + * Contributed by jinbo + * + * This file is part of FFmpeg. + * + * FFm

[FFmpeg-devel] [PATCH v3 1/7] avcodec/hevc: Add init for sao_edge_filter

2023-12-28 Thread jinbo
Forgot to init c->sao_edge_filter[idx] when idx=0/1/2/3. After this patch, the speedup of decoding H265 4K 30FPS 30Mbps on 3A6000 is about 7% (42fps==>45fps). Change-Id: I521999b397fa72b931a23c165cf45f276440cdfb --- libavcodec/loongarch/hevcdsp_init_loongarch.c | 4 1 file changed, 4 inserti

[FFmpeg-devel] [PATCH v3 3/7] avcodec/hevc: Add pel_uni_w_pixels4/6/8/12/16/24/32/48/64 asm opt

2023-12-28 Thread jinbo
000..c5d553effe --- /dev/null +++ b/libavcodec/loongarch/hevc_mc.S @@ -0,0 +1,471 @@ +/* + * Copyright (c) 2023 Loongson Technology Corporation Limited + * Contributed by jinbo + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under

[FFmpeg-devel] [PATCH v1] swscale: Fix conflicting types for loongarch

2024-10-08 Thread jinbo
Build breaks after c1a0e657638f7007dcc807a2d985c22631fcd6d3 --- libswscale/loongarch/swscale_loongarch.h | 48 1 file changed, 24 insertions(+), 24 deletions(-) diff --git a/libswscale/loongarch/swscale_loongarch.h b/libswscale/loongarch/swscale_loongarch.h index 07c91bc