Forgot to init c->sao_edge_filter[idx] when idx=0/1/2/3.
After this patch, the speedup of decoding H265 4K 30FPS
30Mbps on 3A6000 is about 7% (42fps==>45fps).
Change-Id: I521999b397fa72b931a23c165cf45f276440cdfb
---
libavcodec/loongarch/hevcdsp_init_loongarch.c | 4
1 file changed, 4 inserti
Hello, everyone! The hevc asm optimizatons are submitted, here is a
brief introduction. After the 6 patches, the speedup of decoding H265
4K 30FPS 30Mbps on 3A6000 with 8 threads is about 33%(42fps-->56fps).
Reviews are welcome, thanks for in advance.
[PATCH v1 1/6] avcodec/hevc: Add init for sao
000..dd2d820af8
--- /dev/null
+++ b/libavcodec/loongarch/hevc_add_res.S
@@ -0,0 +1,162 @@
+/*
+ * Loongson LSX optimized add_residual functions for HEVC decoding
+ *
+ * Copyright (c) 2023 Loongson Technology Corporation Limited
+ * Contributed by jinbo
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFm
tests/checkasm/checkasm: C LSX LASX
put_hevc_qpel_uni_w_h4_8_c:6.5 1.7 1.2
put_hevc_qpel_uni_w_h6_8_c:14.54.5 3.7
put_hevc_qpel_uni_w_h8_8_c:24.55.7 4.5
put_hevc_qpel_uni_w_h12_8_c: 54.717.512.0
put_hevc_qpel_uni_w_h1
000..c5d553effe
--- /dev/null
+++ b/libavcodec/loongarch/hevc_mc.S
@@ -0,0 +1,471 @@
+/*
+ * Copyright (c) 2023 Loongson Technology Corporation Limited
+ * Contributed by jinbo
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under
tests/checkasm/checkasm: C LSX LASX
put_hevc_epel_uni_w_hv4_8_c: 9.5 2.2
put_hevc_epel_uni_w_hv6_8_c: 18.55.0 3.7
put_hevc_epel_uni_w_hv8_8_c: 30.76.0 4.5
put_hevc_epel_uni_w_hv12_8_c: 63.714.010.7
put_hevc_epel_uni_w_hv16_8_c:
tests/checkasm/checkasm: C LSX LASX
put_hevc_qpel_uni_h4_8_c: 5.7 1.2
put_hevc_qpel_uni_h6_8_c: 12.22.7
put_hevc_qpel_uni_h8_8_c: 21.53.2
put_hevc_qpel_uni_h12_8_c: 47.29.2 7.2
put_hevc_qpel_uni_h16_8_c: 87.011.7
Forgot to init c->sao_edge_filter[idx] when idx=0/1/2/3.
After this patch, the speedup of decoding H265 4K 30FPS
30Mbps on 3A6000 is about 7% (42fps==>45fps).
Change-Id: I521999b397fa72b931a23c165cf45f276440cdfb
---
libavcodec/loongarch/hevcdsp_init_loongarch.c | 4
1 file changed, 4 inserti
v2: Add patch 7/7.
[PATCH v2 1/7] avcodec/hevc: Add init for sao_edge_filter
[PATCH v2 2/7] avcodec/hevc: Add add_residual_4/8/16/32 asm opt
[PATCH v2 3/7] avcodec/hevc: Add pel_uni_w_pixels4/6/8/12/16/24/32/48/64 asm opt
[PATCH v2 4/7] avcodec/hevc: Add qpel_uni_w_v|h4/6/8/12/16/24/32/48/64 asm o
000..c5d553effe
--- /dev/null
+++ b/libavcodec/loongarch/hevc_mc.S
@@ -0,0 +1,471 @@
+/*
+ * Copyright (c) 2023 Loongson Technology Corporation Limited
+ * Contributed by jinbo
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under
000..dd2d820af8
--- /dev/null
+++ b/libavcodec/loongarch/hevc_add_res.S
@@ -0,0 +1,162 @@
+/*
+ * Loongson LSX optimized add_residual functions for HEVC decoding
+ *
+ * Copyright (c) 2023 Loongson Technology Corporation Limited
+ * Contributed by jinbo
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFm
tests/checkasm/checkasm: C LSX LASX
put_hevc_qpel_uni_w_h4_8_c:6.5 1.7 1.2
put_hevc_qpel_uni_w_h6_8_c:14.54.5 3.7
put_hevc_qpel_uni_w_h8_8_c:24.55.7 4.5
put_hevc_qpel_uni_w_h12_8_c: 54.717.512.0
put_hevc_qpel_uni_w_h1
tests/checkasm/checkasm: C LSX LASX
put_hevc_epel_uni_w_hv4_8_c: 9.5 2.2
put_hevc_epel_uni_w_hv6_8_c: 18.55.0 3.7
put_hevc_epel_uni_w_hv8_8_c: 30.76.0 4.5
put_hevc_epel_uni_w_hv12_8_c: 63.714.010.7
put_hevc_epel_uni_w_hv16_8_c:
From: yuanhecai
tests/checkasm/checkasm:
C LSX LASX
hevc_idct_32x32_8_c: 1243.0 211.7 101.7
Speedup of decoding H265 4K 30FPS 30Mbps on
3A6000 with 8 threads is 1fps(56fps-->57fps).
---
libavcodec/loongarch/Makefile | 3 +-
tests/checkasm/checkasm: C LSX LASX
put_hevc_qpel_uni_h4_8_c: 5.7 1.2
put_hevc_qpel_uni_h6_8_c: 12.22.7
put_hevc_qpel_uni_h8_8_c: 21.53.2
put_hevc_qpel_uni_h12_8_c: 47.29.2 7.2
put_hevc_qpel_uni_h16_8_c: 87.011.7
tests/checkasm/checkasm: C LSX LASX
put_hevc_qpel_uni_w_h4_8_c:6.5 1.7 1.2
put_hevc_qpel_uni_w_h6_8_c:14.54.5 3.7
put_hevc_qpel_uni_w_h8_8_c:24.55.7 4.5
put_hevc_qpel_uni_w_h12_8_c: 54.717.512.0
put_hevc_qpel_uni_w_h1
tests/checkasm/checkasm: C LSX LASX
put_hevc_epel_uni_w_hv4_8_c: 9.5 2.2
put_hevc_epel_uni_w_hv6_8_c: 18.55.0 3.7
put_hevc_epel_uni_w_hv8_8_c: 30.76.0 4.5
put_hevc_epel_uni_w_hv12_8_c: 63.714.010.7
put_hevc_epel_uni_w_hv16_8_c:
tests/checkasm/checkasm: C LSX LASX
put_hevc_qpel_uni_h4_8_c: 5.7 1.2
put_hevc_qpel_uni_h6_8_c: 12.22.7
put_hevc_qpel_uni_h8_8_c: 21.53.2
put_hevc_qpel_uni_h12_8_c: 47.29.2 7.2
put_hevc_qpel_uni_h16_8_c: 87.011.7
From: yuanhecai
tests/checkasm/checkasm:
C LSX LASX
hevc_idct_32x32_8_c: 1243.0 211.7 101.7
Speedup of decoding H265 4K 30FPS 30Mbps on
3A6000 with 8 threads is 1fps(56fps-->57fps).
---
libavcodec/loongarch/Makefile | 3 +-
000..dd2d820af8
--- /dev/null
+++ b/libavcodec/loongarch/hevc_add_res.S
@@ -0,0 +1,162 @@
+/*
+ * Loongson LSX optimized add_residual functions for HEVC decoding
+ *
+ * Copyright (c) 2023 Loongson Technology Corporation Limited
+ * Contributed by jinbo
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFm
Forgot to init c->sao_edge_filter[idx] when idx=0/1/2/3.
After this patch, the speedup of decoding H265 4K 30FPS
30Mbps on 3A6000 is about 7% (42fps==>45fps).
Change-Id: I521999b397fa72b931a23c165cf45f276440cdfb
---
libavcodec/loongarch/hevcdsp_init_loongarch.c | 4
1 file changed, 4 inserti
000..c5d553effe
--- /dev/null
+++ b/libavcodec/loongarch/hevc_mc.S
@@ -0,0 +1,471 @@
+/*
+ * Copyright (c) 2023 Loongson Technology Corporation Limited
+ * Contributed by jinbo
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under
Build breaks after c1a0e657638f7007dcc807a2d985c22631fcd6d3
---
libswscale/loongarch/swscale_loongarch.h | 48
1 file changed, 24 insertions(+), 24 deletions(-)
diff --git a/libswscale/loongarch/swscale_loongarch.h
b/libswscale/loongarch/swscale_loongarch.h
index 07c91bc
23 matches
Mail list logo