Re: [FFmpeg-devel] [PATCH v4] avcodec/aarch64/hevc: add luma deblock NEON

2024-02-28 Thread J. Dekker
Martin Storsjö writes: > On Wed, 28 Feb 2024, J. Dekker wrote: > >> >> Martin Storsjö writes: >> >>> On Wed, 28 Feb 2024, J. Dekker wrote: >>> Martin Storsjö writes: > On Tue, 27 Feb 2024, J. Dekker wrote: > >> Benched using single-threaded full decode on an Ampere A

Re: [FFmpeg-devel] [PATCH v4] avcodec/aarch64/hevc: add luma deblock NEON

2024-02-28 Thread Martin Storsjö
On Wed, 28 Feb 2024, J. Dekker wrote: Martin Storsjö writes: On Wed, 28 Feb 2024, J. Dekker wrote: Martin Storsjö writes: On Tue, 27 Feb 2024, J. Dekker wrote: Benched using single-threaded full decode on an Ampere Altra. Bpp Before After Speedup 8 73,3s 65,2s 1.124x 10 114,

Re: [FFmpeg-devel] [PATCH v4] avcodec/aarch64/hevc: add luma deblock NEON

2024-02-28 Thread J. Dekker
Martin Storsjö writes: > On Wed, 28 Feb 2024, J. Dekker wrote: > >> >> Martin Storsjö writes: >> >>> On Tue, 27 Feb 2024, J. Dekker wrote: >>> Benched using single-threaded full decode on an Ampere Altra. Bpp Before After Speedup 8 73,3s 65,2s 1.124x 10 114,2s

Re: [FFmpeg-devel] [PATCH v4] avcodec/aarch64/hevc: add luma deblock NEON

2024-02-28 Thread Martin Storsjö
On Wed, 28 Feb 2024, J. Dekker wrote: Martin Storsjö writes: On Tue, 27 Feb 2024, J. Dekker wrote: Benched using single-threaded full decode on an Ampere Altra. Bpp Before After Speedup 8 73,3s 65,2s 1.124x 10 114,2s 104,0s 1.098x 12 125,8s 115,7s 1.087x Signed-off-by: J. Dekk

Re: [FFmpeg-devel] [PATCH v4] avcodec/aarch64/hevc: add luma deblock NEON

2024-02-28 Thread J. Dekker
Martin Storsjö writes: > On Tue, 27 Feb 2024, J. Dekker wrote: > >> Benched using single-threaded full decode on an Ampere Altra. >> >> Bpp Before After Speedup >> 8 73,3s 65,2s 1.124x >> 10 114,2s 104,0s 1.098x >> 12 125,8s 115,7s 1.087x >> >> Signed-off-by: J. Dekker >> --- >> >> S

Re: [FFmpeg-devel] [PATCH v4] avcodec/aarch64/hevc: add luma deblock NEON

2024-02-27 Thread Martin Storsjö
On Tue, 27 Feb 2024, J. Dekker wrote: Benched using single-threaded full decode on an Ampere Altra. Bpp Before After Speedup 8 73,3s 65,2s 1.124x 10 114,2s 104,0s 1.098x 12 125,8s 115,7s 1.087x Signed-off-by: J. Dekker --- Slightly improved 12bit version. libavcodec/aarch64/hevcd

[FFmpeg-devel] [PATCH v4] avcodec/aarch64/hevc: add luma deblock NEON

2024-02-27 Thread J. Dekker
Benched using single-threaded full decode on an Ampere Altra. Bpp Before After Speedup 8 73,3s 65,2s 1.124x 10 114,2s 104,0s 1.098x 12 125,8s 115,7s 1.087x Signed-off-by: J. Dekker --- Slightly improved 12bit version. libavcodec/aarch64/hevcdsp_deblock_neon.S | 417 +++