[FFmpeg-devel] [PATCH] avutil/mips: refactor msa SLDI_Bn_0 and SLDI_Bn macros.

2019-08-05 Thread gxw
Changing details as following: 1. Modified the parameters order of SLDI_Bn. The previous order of parameters is difficult to understand. 2. Remove the redundant macro SLDI_Bn_0 and use SLDI_Bn instead. --- libavcodec/mips/h264dsp_msa.c | 9 ++-- libavcodec/mips/h264qpel_msa.c | 64 +

[FFmpeg-devel] [PATCH v2] avutil/mips: refactor msa SLDI_Bn_0 and SLDI_Bn macros.

2019-08-06 Thread gxw
Changing details as following: 1. The previous order of parameters are irregular and difficult to understand. Adjust the order of the parameters according to the rule: (RTYPE, input registers, input mask/input index/..., output registers). Most of the existing msa macros follow the rule. 2

[FFmpeg-devel] [PATCH] avutil/mips: refine msa macros CLIP_*.

2019-08-06 Thread gxw
Changing details as following: 1. Refine CLIP_SH, results are in placed to input vectors. 2. Reimplement the macro CLIP_SH/Wn_0_255. The new macro is more efficient than before. 3. Remove CLIP_SH/Wn_0_255_MAX_SATU. CLIP_SH/Wn_0_255_MAX_SATU and CLIP_SH/Wn_0_255 have the same function. It is n

[FFmpeg-devel] [PATCH v2] avutil/mips: refine msa macros CLIP_*.

2019-08-07 Thread gxw
Changing details as following: 1. Remove the local variable out_m in CLIP_SH. Results are assigned to input vector, reduced the data replication. 2. Reimplement the macro CLIP_SH/Wn_0_255. The VP8 decoding performance has improved by 1.1%(7.03x to 7.11x, tested on loongson 3A4000). 3. Remove

[FFmpeg-devel] [PATCH v3] avutil/mips: refine msa macros CLIP_*.

2019-08-07 Thread gxw
Changing details as following: 1. Remove the local variable 'out_m' in 'CLIP_SH' and store the result in source vector. 2. Refine the implementation of macro 'CLIP_SH_0_255' and 'CLIP_SW_0_255'. Performance of VP8 decoding has speed up about 1.1%(from 7.03x to 7.11x). 3. Remove redundant macr

[FFmpeg-devel] [PATCH v4] avutil/mips: refine msa macros CLIP_*.

2019-08-07 Thread gxw
Changing details as following: 1. Remove the local variable 'out_m' in 'CLIP_SH' and store the result in source vector. 2. Refine the implementation of macro 'CLIP_SH_0_255' and 'CLIP_SW_0_255'. Performance of VP8 decoding has speed up about 1.1%(from 7.03x to 7.11x). Performance of H264 d

[FFmpeg-devel] [PATCH] avcodec/mips: simplified code in vp3dsp_idct_msa.c.

2019-09-15 Thread gxw
Use the macros of ADD8 to replace continuous addition operations. --- libavcodec/mips/vp3dsp_idct_msa.c | 80 - libavutil/mips/generic_macros_msa.h | 6 +++ 2 files changed, 22 insertions(+), 64 deletions(-) diff --git a/libavcodec/mips/vp3dsp_idct_msa.c b/

[FFmpeg-devel] [PATCH] avcodec/mips: msa optimizations for vc1dsp

2019-10-11 Thread gxw
.c b/libavcodec/mips/vc1dsp_msa.c new file mode 100644 index 000..1619ea4 --- /dev/null +++ b/libavcodec/mips/vc1dsp_msa.c @@ -0,0 +1,483 @@ +/* + * Loongson SIMD optimized vc1dsp + * + * Copyright (c) 2019 Loongson Technology Corporation Limited + *gxw + * + * This file is

[FFmpeg-devel] [PATCH] avcodec/mips: Fixed four warnings in vc1dsp

2019-10-11 Thread gxw
Change the stride argument to ptrdiff_t in the following functions: ff_put_no_rnd_vc1_chroma_mc8_mmi, ff_put_no_rnd_vc1_chroma_mc4_mmi, ff_avg_no_rnd_vc1_chroma_mc8_mmi, ff_avg_no_rnd_vc1_chroma_mc4_mmi. --- libavcodec/mips/vc1dsp_mips.h | 8 libavcodec/mips/vc1dsp_mmi.c | 8 2

Re: [FFmpeg-devel] [PATCH] avcodec/mips: msa optimizations for vc1dsp

2019-10-21 Thread gxw
>>+TRANSPOSE4x4_SW_SW(in_l0, in_l1, in_l2, in_l3, t_l1, t_l2, t_l3, t_l4); >>+TRANSPOSE4x4_SW_SW(in_r4, in_r5, in_r6, in_r7, in_l0, in_l1, in_l2, in_l3); >>+TRANSPOSE4x4_SW_SW(in_l4, in_l5, in_l6, in_l7, in_l4, in_l5, in_l6, in_l7); >>+in_r4 = t_l1, in_r5 = t_l2,

[FFmpeg-devel] [PATCH v2] avcodec/mips: msa optimizations for vc1dsp

2019-10-21 Thread gxw
.c b/libavcodec/mips/vc1dsp_msa.c new file mode 100644 index 000..6e588e8 --- /dev/null +++ b/libavcodec/mips/vc1dsp_msa.c @@ -0,0 +1,461 @@ +/* + * Loongson SIMD optimized vc1dsp + * + * Copyright (c) 2019 Loongson Technology Corporation Limited + *gxw + * + * This file is

[FFmpeg-devel] [PATCH] avcodec/mips: [loongson] fix failed case: hevc-conformance-AMP_A_Samsung_* in loongson2k

2018-12-17 Thread gxw
The AV_INPUT_BUFFER_PADDING_SIZE has been increased to 64, but the value is still 32 in function ff_hevc_sao_edge_filter_8_msa. So, Modify the corresponding value to 64. Fate tests passed. --- libavcodec/mips/hevc_lpf_sao_msa.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/

[FFmpeg-devel] [PATCH v2] avcodec/mips: Fix failed case: hevc-conformance-AMP_A_Samsung_* when enable msa

2018-12-19 Thread gxw
The AV_INPUT_BUFFER_PADDING_SIZE has been increased to 64, but the value is still 32 in function ff_hevc_sao_edge_filter_8_msa. So, use AV_INPUT_BUFFER_PADDING_SIZE directly. Fate tests passed. --- libavcodec/mips/hevc_lpf_sao_msa.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --g

[FFmpeg-devel] [PATCH v3] avcodec/mips: Fix failed case: hevc-conformance-AMP_A_Samsung_* when enable msa

2018-12-23 Thread gxw
The AV_INPUT_BUFFER_PADDING_SIZE has been increased to 64, but the value is still 32 in function ff_hevc_sao_edge_filter_8_msa. So, use AV_INPUT_BUFFER_PADDING_SIZE directly. Also, use MAX_PB_SIZE directly instead of 64. Fate tests passed. --- libavcodec/mips/hevc_lpf_sao_msa.c | 2 +- 1 file ch

[FFmpeg-devel] [PATCH] avcodec/mips: [loongson] optimize theora decoding in vp3dsp.

2018-12-26 Thread gxw
-$(CONFIG_H264QPEL) += mips/h264qpel_msa.o diff --git a/libavcodec/mips/vp3dsp_idct_msa.c b/libavcodec/mips/vp3dsp_idct_msa.c new file mode 100644 index 000..5427ac5 --- /dev/null +++ b/libavcodec/mips/vp3dsp_idct_msa.c @@ -0,0 +1,662 @@ +/* + * Copyright (c) 2018 gxw + * + * This

[FFmpeg-devel] [PATCH] avcodec/mips: [loongson] optimize theora decoding with mmi.

2019-02-12 Thread gxw
+++ b/libavcodec/mips/vp3dsp_idct_mmi.c @@ -0,0 +1,769 @@ +/* + * Copyright (c) 2018 gxw + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software

[FFmpeg-devel] [PATCH] avcodec/mips: [loongson] mmi optimizations for VP9 put and avg functions

2019-02-18 Thread gxw
diff --git a/libavcodec/mips/vp9_mc_mmi.c b/libavcodec/mips/vp9_mc_mmi.c new file mode 100644 index 000..145bbff --- /dev/null +++ b/libavcodec/mips/vp9_mc_mmi.c @@ -0,0 +1,680 @@ +/* + * Copyright (c) 2019 gxw + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can

Re: [FFmpeg-devel] [PATCH] avcodec/mips: [loongson] mmi optimizations for VP9 put and avg functions

2019-02-20 Thread gxw
> 在 2019年2月21日,上午9:55,Shiyou Yin 写道: > >> -Original Message- >> From: ffmpeg-devel-boun...@ffmpeg.org >> <mailto:ffmpeg-devel-boun...@ffmpeg.org> >> [mailto:ffmpeg-devel-boun...@ffmpeg.org >> <mailto:ffmpeg-devel-boun...@ffmpeg.org>

[FFmpeg-devel] [PATCH v2] avcodec/mips: [loongson] mmi optimizations for VP9 put and avg functions

2019-02-21 Thread gxw
-$(CONFIG_VP9_DECODER)+= mips/vp9_mc_mmi.o diff --git a/libavcodec/mips/vp9_mc_mmi.c b/libavcodec/mips/vp9_mc_mmi.c new file mode 100644 index 000..58a920b --- /dev/null +++ b/libavcodec/mips/vp9_mc_mmi.c @@ -0,0 +1,692 @@ +/* + * Copyright (c) 2019 gxw + * + * This file is part of FFmpeg

[FFmpeg-devel] [PATCH v3] avcodec/mips: [loongson] mmi optimizations for VP9 put and avg functions

2019-02-25 Thread gxw
-$(CONFIG_VP9_DECODER)+= mips/vp9_mc_mmi.o diff --git a/libavcodec/mips/vp9_mc_mmi.c b/libavcodec/mips/vp9_mc_mmi.c new file mode 100644 index 000..e7a8387 --- /dev/null +++ b/libavcodec/mips/vp9_mc_mmi.c @@ -0,0 +1,628 @@ +/* + * Copyright (c) 2019 gxw + * + * This file is part of FFmpeg

Re: [FFmpeg-devel] [PATCH v2] avcodec/mips: [loongson] mmi optimizations for VP9 put and avg functions

2019-02-25 Thread gxw
> 在 2019年2月24日,上午10:55,Shiyou Yin 写道: > > > >> -Original Message- >> From: ffmpeg-devel-boun...@ffmpeg.org >> <mailto:ffmpeg-devel-boun...@ffmpeg.org> >> [mailto:ffmpeg-devel-boun...@ffmpeg.org >> <mailto:ffmpeg-devel-boun