> On 19 Nov 2017, at 01:35, Rafal Dabrowa wrote:
>
>
> This is a proposal of performance optimizations for 8-bit
> hevc video decoding on aarch64 platform with neon (simd) extension.
Nice to see the work for aarch64!
We are also in the process of doing NEON optimization for HEVC decoding.
(
NEON optimization for sao
avcodec/hevcdsp: Add NEON optimization for idct16x16
Shengbin Meng (1):
avcodec/hevcdsp: Add NEON optimization for whole-pixel interpolation
libavcodec/arm/Makefile|4 +-
libavcodec/arm/hevcdsp_epel_neon.S | 2078
From: Meng Wang
Signed-off-by: Meng Wang
---
libavcodec/arm/hevcdsp_init_neon.c | 66 +
libavcodec/arm/hevcdsp_qpel_neon.S | 509 +
2 files changed, 575 insertions(+)
diff --git a/libavcodec/arm/hevcdsp_init_neon.c
b/libavcodec/arm/hevcdsp_init_neon.c
From: Meng Wang
Signed-off-by: Meng Wang
---
libavcodec/arm/Makefile| 3 +-
libavcodec/arm/hevcdsp_init_neon.c | 62 +
libavcodec/arm/hevcdsp_sao_neon.S | 181 +
3 files changed, 245 insertions(+), 1 deletion(-)
create mode 100644
New code is written for qpel; and then code for qpel is reused for epel,
because whole-pixel interpolation in qpel and epel are identical.
Signed-off-by: Shengbin Meng
---
libavcodec/arm/hevcdsp_init_neon.c | 106 ++
libavcodec/arm/hevcdsp_qpel_neon.S | 177
From: Meng Wang
Signed-off-by: Meng Wang
---
libavcodec/arm/hevcdsp_epel_neon.S | 10 ++
libavcodec/arm/hevcdsp_qpel_neon.S | 24
2 files changed, 30 insertions(+), 4 deletions(-)
diff --git a/libavcodec/arm/hevcdsp_epel_neon.S
b/libavcodec/arm/hevcdsp_epel_ne
From: Meng Wang
Signed-off-by: Meng Wang
---
libavcodec/arm/Makefile|3 +-
libavcodec/arm/hevcdsp_epel_neon.S | 2068
libavcodec/arm/hevcdsp_init_neon.c | 459
3 files changed, 2529 insertions(+), 1 deletion(-)
create mode 100644 l
From: Meng Wang
Signed-off-by: Meng Wang
---
libavcodec/arm/hevcdsp_idct_neon.S | 241 +
libavcodec/arm/hevcdsp_init_neon.c | 2 +
2 files changed, 243 insertions(+)
diff --git a/libavcodec/arm/hevcdsp_idct_neon.S
b/libavcodec/arm/hevcdsp_idct_neon.S
inde
From: Meng Wang
Signed-off-by: Meng Wang
---
libavcodec/arm/hevcdsp_init_neon.c | 67 +
libavcodec/arm/hevcdsp_qpel_neon.S | 509 +
2 files changed, 576 insertions(+)
diff --git a/libavcodec/arm/hevcdsp_init_neon.c
b/libavcodec/arm/hevcdsp_init_neon.c
From: Meng Wang
Signed-off-by: Meng Wang
---
libavcodec/arm/Makefile|3 +-
libavcodec/arm/hevcdsp_epel_neon.S | 2068
libavcodec/arm/hevcdsp_init_neon.c | 458
3 files changed, 2528 insertions(+), 1 deletion(-)
create mode 100644 l
New code is written for qpel; and then code for qpel is reused for epel,
because whole-pixel interpolation in qpel and epel are identical.
Signed-off-by: Shengbin Meng
---
libavcodec/arm/hevcdsp_init_neon.c | 107 ++
libavcodec/arm/hevcdsp_qpel_neon.S | 177
From: Meng Wang
Signed-off-by: Meng Wang
---
libavcodec/arm/Makefile| 3 +-
libavcodec/arm/hevcdsp_init_neon.c | 62 +
libavcodec/arm/hevcdsp_sao_neon.S | 181 +
3 files changed, 245 insertions(+), 1 deletion(-)
create mode 100644
From: Meng Wang
Signed-off-by: Meng Wang
---
libavcodec/arm/hevcdsp_epel_neon.S | 10 ++
libavcodec/arm/hevcdsp_qpel_neon.S | 24
2 files changed, 30 insertions(+), 4 deletions(-)
diff --git a/libavcodec/arm/hevcdsp_epel_neon.S
b/libavcodec/arm/hevcdsp_epel_ne
> On 22 Nov 2017, at 20:26, Michael Niedermayer wrote:
>
> On Wed, Nov 22, 2017 at 07:12:01PM +0800, Shengbin Meng wrote:
>> From: Meng Wang
>>
>> Signed-off-by: Meng Wang
>> ---
>> libavcodec/arm/hevcdsp_init_neon.c | 66 +
>
Hi,
I’d like to know if anyone is dong or interested in ARM optimization for the
native HEVC decoder in FFmpeg?
We can see that some time-consuming operations in HEVC decoding have not been
optimized using NEON, e.g, qpel and epel interpolation, SAO, IDCT of large
blocks.
I have some optimizat
Hi,
By checkasm benchmark, I can see a speedup of ~3x for band mode and ~6x for
edge mode on my device (the device has aarch64 CPU, but I configured ffmpeg
with `—arch=arm`). And FATE passed as well.
Results of a checkasm run:
$./tests/checkasm/checkasm --test=hevc_sao --bench
$ sudo ./tests/c
The code looks good to me. I think the wrapper is fine, because that part of
code is not suitable for NEON assembly.
But you can remove the using of `sizeof(uint8_t)` as suggested by Carl.
Shengbin Meng
> On 19 Mar 2018, at 12:41, Yingming Fan wrote:
>
> Hi, is there any review a
> On 22 Mar 2018, at 20:51, Yingming Fan wrote:
>
> From: Meng Wang
>
> Signed-off-by: Meng Wang
> ---
> This v2 patch remove unused codes 'stride_dst /= sizeof(uint8_t);' compared
> to v1. V1 have this codes because we referred to hevc dsp template codes.
>
> As FFmpeg hevc decoder have n
LGTM.
Regards,
Shengbin Meng
> On 27 Mar 2018, at 20:43, Yingming Fan wrote:
>
> From: Meng Wang
>
> Signed-off-by: Meng Wang
> ---
> This v3 patch removed unused codes 'stride_dst /= sizeof(uint8_t);' compared
> to v1. V1 have this codes because we r
> On Apr 9, 2018, at 10:12, Yingming Fan wrote:
>
> From: Yingming Fan
>
> ---
> Hi, there.
> I plane to submit our arm32 neon codes for qpel and epel.
> While before this i will submit hevc_mc checkasm codes.
> This hevc_mc checkasm codes check every qpel and epel function, including 8
> 10
20 matches
Mail list logo