[FFmpeg-devel] [PATCH] riscv: detect fast CLZ from Zbb extension

2022-09-03 Thread remi
From: Rémi Denis-Courmont RISC-V defines the CLZ instruction as part of the Zbb subset of the bit mapulation extension (B). We can detect it from the __riscv_zbb predefined constant. It will be non-zero if supported, zero if enabled in the compiler flags but not supported by the compiler, and und

[FFmpeg-devel] [PATCH] riscv: detect fast CLZ from Zbb extension

2022-09-03 Thread remi
From: Rémi Denis-Courmont RISC-V defines the CLZ instruction as part of the Zbb subset of the bit mapulation extension (B). We can detect it from the __riscv_zbb predefined constant. It will be non-zero if supported, zero if enabled in the compiler flags but not supported by the compiler, and und

[FFmpeg-devel] [PATCH] riscv: add av_bswap{16,32,64} with Zbb

2022-09-03 Thread remi
From: Rémi Denis-Courmont If the target supports the Basic bit-manipulation (Zbb) extension, then REV8 is available to reverse byte order. Note that this instruction only exists at the "XLEN" register size (available as __riscv_xlen). --- libavutil/bswap.h | 2 ++ libavutil/riscv/bswap.h

[FFmpeg-devel] [PATCH 2/2] arm: relax byte-swap assembler constraints

2022-09-03 Thread remi
From: Rémi Denis-Courmont There are no particular reasons to force the compiler to use the same register as output and input operand. This forces an extra MOV instruction if the input value needs to be reused after the swap. In most cases, this makes no differences, as the compiler will seleect

[FFmpeg-devel] [PATCH 1/2] aarch64: relax byte-swap assembler constraints

2022-09-03 Thread remi
From: Rémi Denis-Courmont There are no particular reasons to force the compiler to use the same register as output and input operand. This forces an extra MOV instruction if the input value needs to be reused after the swap. In most cases, this makes no differences, as the compiler will seleect

[FFmpeg-devel] [PATCH 2/3] riscv: initial common header for assembler macros

2022-09-03 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/asm.h | 33 + 1 file changed, 33 insertions(+) create mode 100644 libavutil/riscv/asm.h diff --git a/libavutil/riscv/asm.h b/libavutil/riscv/asm.h new file mode 100644 index 00..31001b8bdb --- /dev/null +++ b

[FFmpeg-devel] [PATCH 1/3] riscv: add CPU flags for the RISC-V Vector extension

2022-09-03 Thread remi
From: Rémi Denis-Courmont RVV defines a total of 12 different extensions: V, Zvl32b, Zvl64b, Zvl128b, Zvl256b, Zvl512b, Zvl1024b, Zve32x, Zve32f, Zve64x, Zve64f and Zve64d. At this stage, we don't care about the vector length extensions Zvl*, as most or all optimisations will be running in a loo

[FFmpeg-devel] [PATCH 3/3] riscv: add float vector-scalar multiplication

2022-09-03 Thread remi
From: Rémi Denis-Courmont This is based on existing code from the VLC git tree, though the size and scalar arguments are swapped. --- libavutil/float_dsp.c| 2 ++ libavutil/float_dsp.h| 1 + libavutil/riscv/Makefile | 4 ++- libavutil/riscv/float_dsp_init.c | 4

[FFmpeg-devel] [PATCH 04/10] riscv: float vector-vector multiplication with RVV

2022-09-04 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 9 - libavutil/riscv/float_dsp_rvv.S | 34 2 files changed, 42 insertions(+), 1 deletion(-) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index 279412c0

[FFmpeg-devel] [PATCH 05/10] riscv: float vector multiply-accumulate with RVV

2022-09-04 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 6 + libavutil/riscv/float_dsp_rvv.S | 42 2 files changed, 48 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index 4135284c76..a1bb112ec7 1006

[FFmpeg-devel] [PATCH 06/10] riscv: float vector multiplication-addition with RVV

2022-09-04 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 3 +++ libavutil/riscv/float_dsp_rvv.S | 19 +++ 2 files changed, 22 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index a1bb112ec7..8539fe9ac5 100644 --- a/libavu

[FFmpeg-devel] [PATCH 07/10] riscv: float vector sum-and-difference with RVV

2022-09-04 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 2 ++ libavutil/riscv/float_dsp_rvv.S | 18 ++ 2 files changed, 20 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index 8539fe9ac5..2165394585 100644 --- a/libavuti

[FFmpeg-devel] [PATCH 08/10] riscv: float reversed vector multiplication with RVV

2022-09-04 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 3 +++ libavutil/riscv/float_dsp_rvv.S | 22 ++ 2 files changed, 25 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index 2165394585..1183460181 100644 --- a/lib

[FFmpeg-devel] [PATCH 09/10] riscv: float vector windowed overlap/add with RVV

2022-09-04 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 3 +++ libavutil/riscv/float_dsp_rvv.S | 35 2 files changed, 38 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index 1183460181..887706d899 100644

[FFmpeg-devel] [PATCH 10/10] riscv: float vector dot product with RVV

2022-09-04 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 2 ++ libavutil/riscv/float_dsp_rvv.S | 23 +++ 2 files changed, 25 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index 887706d899..7c2fc10e99 100644 --- a/lib

[FFmpeg-devel] [PATCH 01/10] riscv: add CPU flags for the RISC-V Vector extension

2022-09-04 Thread remi
From: Rémi Denis-Courmont RVV defines a total of 12 different extensions: V, Zvl32b, Zvl64b, Zvl128b, Zvl256b, Zvl512b, Zvl1024b, Zve32x, Zve32f, Zve64x, Zve64f and Zve64d. At this stage, we don't expose the vector length extensions Zvl*, as the vector length is most commonly determined at run-t

[FFmpeg-devel] [PATCH 02/10] riscv: initial common header for assembler macros

2022-09-04 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/asm.S | 33 + 1 file changed, 33 insertions(+) create mode 100644 libavutil/riscv/asm.S diff --git a/libavutil/riscv/asm.S b/libavutil/riscv/asm.S new file mode 100644 index 00..31001b8bdb --- /dev/null +++ b

[FFmpeg-devel] [PATCH 03/10] riscv: float vector-scalar multiplication with RVV

2022-09-04 Thread remi
From: Rémi Denis-Courmont This is based on existing code from the VLC git tree with two minor changes to account for the different function prototypes. --- libavutil/float_dsp.c| 2 ++ libavutil/float_dsp.h| 1 + libavutil/riscv/Makefile | 4 ++- libavutil/risc

[FFmpeg-devel] [PATCH] lavu/riscv: cycle counter for AV_READ_TIME

2022-09-05 Thread remi
From: Rémi Denis-Courmont This uses the architected RISC-V 64-bit cycle counter from the RISC-V unprivileged instruction set. In 64-bit and 128-bit, this is a straightforward CSR read. In 32-bit mode, the 64-bit value is exposed as two CSRs, which cannot be read atomically, so a loop is necessar

[FFmpeg-devel] [PATCH 1/5] doc: reference the RISC-V specification

2022-09-06 Thread remi
From: Rémi Denis-Courmont --- doc/optimization.txt | 5 + 1 file changed, 5 insertions(+) diff --git a/doc/optimization.txt b/doc/optimization.txt index 974e2f9af2..3ed29fe38c 100644 --- a/doc/optimization.txt +++ b/doc/optimization.txt @@ -267,6 +267,11 @@ CELL/SPU: http://www-01.ibm.com

[FFmpeg-devel] [PATCH 2/5] lavu/riscv: AV_READ_TIME cycle counter

2022-09-06 Thread remi
From: Rémi Denis-Courmont This uses the architected RISC-V 64-bit cycle counter from the RISC-V unprivileged instruction set. In 64-bit and 128-bit, this is a straightforward CSR read. In 32-bit mode, the 64-bit value is exposed as two CSRs, which cannot be read atomically, so a loop is necessar

[FFmpeg-devel] [PATCH 3/5] configure/riscv: detect fast CLZ

2022-09-06 Thread remi
From: Rémi Denis-Courmont RISC-V defines the CLZ instruction as part of the ratified Zbb subset of the (not yet ratified) bit mapulation extension (B). We can detect it from the __riscv_zbb predefined constant. At least GCC 12 already supports this correctly. Note that the macro will be non-zero

[FFmpeg-devel] [PATCH 4/5] lavu/riscv: byte-swap operations

2022-09-06 Thread remi
From: Rémi Denis-Courmont If the target supports the Basic bit-manipulation (Zbb) extension, then the REV8 instruction is available to reverse byte order. Note that this instruction only exists at the "XLEN" register size, so we need to right shift the result down to the data width. If Zbb is n

[FFmpeg-devel] [PATCH 5/5] lavu/riscv: add optimisations

2022-09-06 Thread remi
From: Rémi Denis-Courmont This provides some micro-optimisations for signed integer clipping, and support for bit weight with the Zbb extension. --- libavutil/intmath.h | 5 +- libavutil/riscv/intmath.h | 99 +++ 2 files changed, 102 insertions(+), 2 de

[FFmpeg-devel] [PATCH 5/5] lavu/riscv: add optimisations

2022-09-06 Thread remi
From: Rémi Denis-Courmont This provides some micro-optimisations for signed integer clipping, and support for bit weight with the Zbb extension. --- libavutil/intmath.h | 5 +- libavutil/riscv/intmath.h | 103 ++ 2 files changed, 106 insertions(+), 2 d

[FFmpeg-devel] [PATCH 01/12] lavu/riscv: add CPU flags for the RISC-V Vector extension

2022-09-06 Thread remi
From: Rémi Denis-Courmont RVV defines a total of 12 different extensions, including: - 5 different instruction subsets: - Zve32x: 8-, 16- and 32-bit integers, - Zve32f: Zve32x plus single precision floats, - Zve64x: Zve32x plus 64-bit integers, - Zve64f: Zve32f plus Zve64x, - Zve64d: Z

[FFmpeg-devel] [PATCH 02/12] checkasm: register the RISC-V V subsets

2022-09-06 Thread remi
From: Rémi Denis-Courmont --- tests/checkasm/checkasm.c | 5 + 1 file changed, 5 insertions(+) diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index e56fd3850e..a5d0503811 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -226,6 +226,11 @@ static c

[FFmpeg-devel] [PATCH 07/12] lavu/riscv: float vector multiplication-addition with RVV

2022-09-06 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 3 +++ libavutil/riscv/float_dsp_rvv.S | 19 +++ 2 files changed, 22 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index a1bb112ec7..8539fe9ac5 100644 --- a/libavu

[FFmpeg-devel] [PATCH 06/12] lavu/riscv: float vector multiply-accumulate with RVV

2022-09-06 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 6 + libavutil/riscv/float_dsp_rvv.S | 38 2 files changed, 44 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index 4135284c76..a1bb112ec7 1006

[FFmpeg-devel] [PATCH 03/12] lavu/riscv: initial common header for assembler macros

2022-09-06 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/asm.S | 74 +++ 1 file changed, 74 insertions(+) create mode 100644 libavutil/riscv/asm.S diff --git a/libavutil/riscv/asm.S b/libavutil/riscv/asm.S new file mode 100644 index 00..7623c161cf --- /dev/

[FFmpeg-devel] [PATCH 08/12] lavu/riscv: float vector sum-and-difference with RVV

2022-09-06 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 2 ++ libavutil/riscv/float_dsp_rvv.S | 18 ++ 2 files changed, 20 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index 8539fe9ac5..2165394585 100644 --- a/libavuti

[FFmpeg-devel] [PATCH 09/12] lavu/riscv: float reversed vector multiplication with RVV

2022-09-06 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 3 +++ libavutil/riscv/float_dsp_rvv.S | 22 ++ 2 files changed, 25 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index 2165394585..1183460181 100644 --- a/lib

[FFmpeg-devel] [PATCH 10/12] lavu/riscv: float vector windowed overlap/add with RVV

2022-09-06 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 3 +++ libavutil/riscv/float_dsp_rvv.S | 35 2 files changed, 38 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index 1183460181..887706d899 100644

[FFmpeg-devel] [PATCH 11/12] lavu/riscv: float vector dot product with RVV

2022-09-06 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 2 ++ libavutil/riscv/float_dsp_rvv.S | 21 + 2 files changed, 23 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index 887706d899..7c2fc10e99 100644 --- a/libav

[FFmpeg-devel] [PATCH 04/12] lavu/riscv: float vector-scalar multiplication with RVV

2022-09-06 Thread remi
From: Rémi Denis-Courmont This is based on existing code from the VLC git tree with two minor changes to account for the different function prototypes. --- libavutil/float_dsp.c| 2 ++ libavutil/float_dsp.h| 1 + libavutil/riscv/Makefile | 4 ++- libavutil/risc

[FFmpeg-devel] [PATCH 12/12] lavu/riscv: fixed vector sum-and-difference with RVV

2022-09-06 Thread remi
From: Rémi Denis-Courmont --- libavutil/fixed_dsp.c| 4 +++- libavutil/fixed_dsp.h| 1 + libavutil/riscv/Makefile | 2 ++ libavutil/riscv/fixed_dsp_init.c | 33 +++ libavutil/riscv/fixed_dsp_rvv.S | 38

[FFmpeg-devel] [PATCH 05/12] lavu/riscv: float vector-vector multiplication with RVV

2022-09-06 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 9 - libavutil/riscv/float_dsp_rvv.S | 34 2 files changed, 42 insertions(+), 1 deletion(-) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index 279412c0

[FFmpeg-devel] [PATCH 01/18] doc: reference the RISC-V specification

2022-09-09 Thread remi
From: Rémi Denis-Courmont --- doc/optimization.txt | 5 + 1 file changed, 5 insertions(+) diff --git a/doc/optimization.txt b/doc/optimization.txt index 974e2f9af2..3ed29fe38c 100644 --- a/doc/optimization.txt +++ b/doc/optimization.txt @@ -267,6 +267,11 @@ CELL/SPU: http://www-01.ibm.com

[FFmpeg-devel] [PATCH 03/18] configure/riscv: detect fast CLZ

2022-09-09 Thread remi
From: Rémi Denis-Courmont RISC-V defines the CLZ instruction as part of the ratified Zbb subset of the (not yet ratified) bit mapulation extension (B). We can detect it from the __riscv_zbb predefined constant. At least GCC 12 already supports this correctly. Note that the macro will be non-zero

[FFmpeg-devel] [PATCH 02/18] lavu/riscv: AV_READ_TIME cycle counter

2022-09-09 Thread remi
From: Rémi Denis-Courmont This uses the architected RISC-V 64-bit cycle counter from the RISC-V unprivileged instruction set. In 64-bit and 128-bit, this is a straightforward CSR read. In 32-bit mode, the 64-bit value is exposed as two CSRs, which cannot be read atomically, so a loop is necessar

[FFmpeg-devel] [PATCH 04/18] lavu/riscv: byte-swap operations

2022-09-09 Thread remi
From: Rémi Denis-Courmont If the target supports the Basic bit-manipulation (Zbb) extension, then the REV8 instruction is available to reverse byte order. Note that this instruction only exists at the "XLEN" register size, so we need to right shift the result down to the data width. If Zbb is n

[FFmpeg-devel] [PATCH 05/18] lavu/riscv: add optimisations

2022-09-09 Thread remi
From: Rémi Denis-Courmont This provides some micro-optimisations for signed integer clipping, and support for bit weight with the Zbb extension. --- libavutil/intmath.h | 5 +- libavutil/riscv/intmath.h | 103 ++ 2 files changed, 106 insertions(+), 2 d

[FFmpeg-devel] [PATCH 13/18] lavu/riscv: float vector multiplication-addition with RVV

2022-09-09 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 3 +++ libavutil/riscv/float_dsp_rvv.S | 19 +++ 2 files changed, 22 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index b63da72acd..9b31ed2ed1 100644 --- a/libavu

[FFmpeg-devel] [PATCH 16/18] lavu/riscv: float vector windowed overlap/add with RVV

2022-09-09 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 3 +++ libavutil/riscv/float_dsp_rvv.S | 35 2 files changed, 38 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index e6a5efbf68..99cc8afd31 100644

[FFmpeg-devel] [PATCH 14/18] lavu/riscv: float vector sum-and-difference with RVV

2022-09-09 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 2 ++ libavutil/riscv/float_dsp_rvv.S | 18 ++ 2 files changed, 20 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index 9b31ed2ed1..4980214821 100644 --- a/libavuti

[FFmpeg-devel] [PATCH 15/18] lavu/riscv: float reversed vector multiplication with RVV

2022-09-09 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 3 +++ libavutil/riscv/float_dsp_rvv.S | 22 ++ 2 files changed, 25 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index 4980214821..e6a5efbf68 100644 --- a/lib

[FFmpeg-devel] [PATCH 17/18] lavu/riscv: float vector dot product with RVV

2022-09-09 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 2 ++ libavutil/riscv/float_dsp_rvv.S | 21 + 2 files changed, 23 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index 99cc8afd31..9c5e06bae9 100644 --- a/libav

[FFmpeg-devel] [PATCH 18/18] lavu/riscv: fixed vector sum-and-difference with RVV

2022-09-09 Thread remi
From: Rémi Denis-Courmont --- libavutil/fixed_dsp.c| 4 +++- libavutil/fixed_dsp.h| 1 + libavutil/riscv/Makefile | 4 +++- libavutil/riscv/fixed_dsp_init.c | 33 +++ libavutil/riscv/fixed_dsp_rvv.S | 38

[FFmpeg-devel] [PATCH 06/18] configure: probe RISC-V Vector extension

2022-09-09 Thread remi
From: Rémi Denis-Courmont --- configure| 15 +++ ffbuild/arch.mak | 2 ++ 2 files changed, 17 insertions(+) diff --git a/configure b/configure index b7dc1d8656..c5f20cc323 100755 --- a/configure +++ b/configure @@ -462,6 +462,7 @@ Optimization options (experts only): --d

[FFmpeg-devel] [PATCH 09/18] checkasm: register the RISC-V V subsets

2022-09-09 Thread remi
From: Rémi Denis-Courmont --- tests/checkasm/checkasm.c | 5 + 1 file changed, 5 insertions(+) diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index e56fd3850e..a5d0503811 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -226,6 +226,11 @@ static c

[FFmpeg-devel] [PATCH 07/18] lavu/riscv: initial common header for assembler macros

2022-09-09 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/asm.S | 74 +++ 1 file changed, 74 insertions(+) create mode 100644 libavutil/riscv/asm.S diff --git a/libavutil/riscv/asm.S b/libavutil/riscv/asm.S new file mode 100644 index 00..7623c161cf --- /dev/

[FFmpeg-devel] [PATCH 10/18] lavu/riscv: float vector-scalar multiplication with RVV

2022-09-09 Thread remi
From: Rémi Denis-Courmont This is based on existing code from the VLC git tree with two minor changes to account for the different function prototypes. --- libavutil/float_dsp.c| 2 ++ libavutil/float_dsp.h| 1 + libavutil/riscv/Makefile | 4 ++- libavutil/risc

[FFmpeg-devel] [PATCH 11/18] lavu/riscv: float vector-vector multiplication with RVV

2022-09-09 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 9 - libavutil/riscv/float_dsp_rvv.S | 34 2 files changed, 42 insertions(+), 1 deletion(-) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index 7c553e91

[FFmpeg-devel] [PATCH 08/18] lavu/riscv: add CPU flags for the RISC-V Vector extension

2022-09-09 Thread remi
From: Rémi Denis-Courmont RVV defines a total of 12 different extensions, including: - 5 different instruction subsets: - Zve32x: 8-, 16- and 32-bit integers, - Zve32f: Zve32x plus single precision floats, - Zve64x: Zve32x plus 64-bit integers, - Zve64f: Zve32f plus Zve64x, - Zve64d: Z

[FFmpeg-devel] [PATCH 12/18] lavu/riscv: float vector multiply-accumulate with RVV

2022-09-09 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 6 + libavutil/riscv/float_dsp_rvv.S | 38 2 files changed, 44 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index 49a4c95a0b..b63da72acd 1006

[FFmpeg-devel] [PATCH 01/18] doc: reference the RISC-V specification

2022-09-12 Thread remi
From: Rémi Denis-Courmont --- doc/optimization.txt | 5 + 1 file changed, 5 insertions(+) diff --git a/doc/optimization.txt b/doc/optimization.txt index 974e2f9af2..3ed29fe38c 100644 --- a/doc/optimization.txt +++ b/doc/optimization.txt @@ -267,6 +267,11 @@ CELL/SPU: http://www-01.ibm.com

[FFmpeg-devel] [PATCH 02/18] lavu/riscv: AV_READ_TIME cycle counter

2022-09-12 Thread remi
From: Rémi Denis-Courmont This uses the architected RISC-V 64-bit cycle counter from the RISC-V unprivileged instruction set. In 64-bit and 128-bit, this is a straightforward CSR read. In 32-bit mode, the 64-bit value is exposed as two CSRs, which cannot be read atomically, so a loop is necessar

[FFmpeg-devel] [PATCH 03/18] configure/riscv: detect fast CLZ

2022-09-12 Thread remi
From: Rémi Denis-Courmont RISC-V defines the CLZ instruction as part of the ratified Zbb subset of the (not yet ratified) bit mapulation extension (B). We can detect it from the __riscv_zbb predefined constant. At least GCC 12 already supports this correctly. Note that the macro will be non-zero

[FFmpeg-devel] [PATCH 04/18] lavu/riscv: byte-swap operations

2022-09-12 Thread remi
From: Rémi Denis-Courmont If the target supports the Basic bit-manipulation (Zbb) extension, then the REV8 instruction is available to reverse byte order. Note that this instruction only exists at the "XLEN" register size, so we need to right shift the result down to the data width. If Zbb is n

[FFmpeg-devel] [PATCH 05/18] lavu/riscv: add optimisations

2022-09-12 Thread remi
From: Rémi Denis-Courmont This provides some micro-optimisations for signed integer clipping, and support for bit weight with the Zbb extension. --- libavutil/intmath.h | 5 +- libavutil/riscv/intmath.h | 103 ++ 2 files changed, 106 insertions(+), 2 d

[FFmpeg-devel] [PATCH 07/18] lavu/riscv: initial common header for assembler macros

2022-09-12 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/asm.S | 74 +++ 1 file changed, 74 insertions(+) create mode 100644 libavutil/riscv/asm.S diff --git a/libavutil/riscv/asm.S b/libavutil/riscv/asm.S new file mode 100644 index 00..7623c161cf --- /dev/

[FFmpeg-devel] [PATCH 06/18] configure: probe RISC-V Vector extension

2022-09-12 Thread remi
From: Rémi Denis-Courmont --- Makefile | 2 +- configure| 15 +++ ffbuild/arch.mak | 2 ++ 3 files changed, 18 insertions(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 61f79e27ae..1fb742f390 100644 --- a/Makefile +++ b/Makefile @@ -91,7 +91,7 @@ ffbuild/

[FFmpeg-devel] [PATCH 09/18] checkasm: register the RISC-V V subsets

2022-09-12 Thread remi
From: Rémi Denis-Courmont --- tests/checkasm/checkasm.c | 5 + 1 file changed, 5 insertions(+) diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index e56fd3850e..a5d0503811 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -226,6 +226,11 @@ static c

[FFmpeg-devel] [PATCH 08/18] lavu/riscv: add CPU flags for the RISC-V Vector extension

2022-09-12 Thread remi
From: Rémi Denis-Courmont RVV defines a total of 12 different extensions, including: - 5 different instruction subsets: - Zve32x: 8-, 16- and 32-bit integers, - Zve32f: Zve32x plus single precision floats, - Zve64x: Zve32x plus 64-bit integers, - Zve64f: Zve32f plus Zve64x, - Zve64d: Z

[FFmpeg-devel] [PATCH 11/18] lavu/riscv: float vector-vector multiplication with RVV

2022-09-12 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 9 - libavutil/riscv/float_dsp_rvv.S | 34 2 files changed, 42 insertions(+), 1 deletion(-) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index f1d3d528

[FFmpeg-devel] [PATCH 10/18] lavu/riscv: float vector-scalar multiplication with RVV

2022-09-12 Thread remi
From: Rémi Denis-Courmont This is based on existing code from the VLC git tree with two minor changes to account for the different function prototypes. --- libavutil/float_dsp.c| 2 ++ libavutil/float_dsp.h| 1 + libavutil/riscv/Makefile | 4 ++- libavutil/risc

[FFmpeg-devel] [PATCH 12/18] lavu/riscv: float vector multiply-accumulate with RVV

2022-09-12 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 6 + libavutil/riscv/float_dsp_rvv.S | 38 2 files changed, 44 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index 903da4eeda..1381eadab6 1006

[FFmpeg-devel] [PATCH 18/18] lavu/riscv: fixed vector sum-and-difference with RVV

2022-09-12 Thread remi
From: Rémi Denis-Courmont --- libavutil/fixed_dsp.c| 4 +++- libavutil/fixed_dsp.h| 1 + libavutil/riscv/Makefile | 4 +++- libavutil/riscv/fixed_dsp_init.c | 36 ++ libavutil/riscv/fixed_dsp_rvv.S | 38 +

[FFmpeg-devel] [PATCH 17/18] lavu/riscv: float vector dot product with RVV

2022-09-12 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 2 ++ libavutil/riscv/float_dsp_rvv.S | 21 + 2 files changed, 23 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index cf8c995d7c..055cdc7520 100644 --- a/libav

[FFmpeg-devel] [PATCH 13/18] lavu/riscv: float vector multiplication-addition with RVV

2022-09-12 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 3 +++ libavutil/riscv/float_dsp_rvv.S | 19 +++ 2 files changed, 22 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index 1381eadab6..9bc1976d04 100644 --- a/libavu

[FFmpeg-devel] [PATCH 16/18] lavu/riscv: float vector windowed overlap/add with RVV

2022-09-12 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 3 +++ libavutil/riscv/float_dsp_rvv.S | 35 2 files changed, 38 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index ae089d2fdb..cf8c995d7c 100644

[FFmpeg-devel] [PATCH 14/18] lavu/riscv: float vector sum-and-difference with RVV

2022-09-12 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 2 ++ libavutil/riscv/float_dsp_rvv.S | 18 ++ 2 files changed, 20 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index 9bc1976d04..c2b72c3b25 100644 --- a/libavuti

[FFmpeg-devel] [PATCH 15/18] lavu/riscv: float reversed vector multiplication with RVV

2022-09-12 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 3 +++ libavutil/riscv/float_dsp_rvv.S | 22 ++ 2 files changed, 25 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index c2b72c3b25..ae089d2fdb 100644 --- a/lib

[FFmpeg-devel] [PATCH] lavc: avoid rounding errors in float constants

2022-09-13 Thread remi
From: Rémi Denis-Courmont INT_MAX is (typically) a value with 31 significant bits but float can only represent 23 significant bits, leading to a rounding error. This substitutes the actual rounded value to avoid a clang warning: warning: implicit conversion from 'int' to 'float' changes value

[FFmpeg-devel] [PATCHv2] lavc: avoid rounding errors in float constants

2022-09-13 Thread remi
From: Rémi Denis-Courmont INT_MAX is (typically) a value with 31 significant bits but float can only represent 23 significant bits, leading to a rounding error. This substitutes the actual rounded value as an unsigned int, to avoid a clang warning while not overflowing signed int: warning: imp

[FFmpeg-devel] [PATCH] lavc/audiodsp: fix aliasing violation

2022-09-13 Thread remi
From: Rémi Denis-Courmont Even though they have the same size, and typically the same alignment, uint32_t and float are under no circumstances compatible types in C. The casts from float * to uint32_t * are invalid here. Insofar as the resulting pointers are dereferenced, this is undefined behav

[FFmpeg-devel] [PATCHv2] lavc/audiodsp: fix aliasing violation

2022-09-13 Thread remi
From: Rémi Denis-Courmont Even though they have the same size, and typically the same alignment, uint32_t and float are under no circumstances compatible types in C. The casts from float * to uint32_t * are invalid here. Insofar as the resulting pointers are dereferenced, this is undefined behav

[FFmpeg-devel] [PATCH 1/1] lavu/riscv: fix av_clip_int16

2022-09-14 Thread remi
From: Rémi Denis-Courmont Some serious copy-paste / squash / rebase mismanipulation here. --- libavutil/riscv/intmath.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libavutil/riscv/intmath.h b/libavutil/riscv/intmath.h index 78f7ba930a..3263a79dc4 100644 --- a/libavuti

[FFmpeg-devel] [PATCH 1/3] lavu: detect RISC-V F extension (i.e. float)

2022-09-14 Thread remi
From: Rémi Denis-Courmont This introduces compile-tim and run-time CPU detection on RISC-V. In practice, I doubt that FFmpeg will ever see a RISC-V CPU without the F extension, and if it does, it probably won't have run-time detection. So the flag is essentially always set. But as things stand,

[FFmpeg-devel] [PATCH 2/3] lavu/riscv: initial common header for assembler macros

2022-09-14 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/asm.S | 74 +++ 1 file changed, 74 insertions(+) create mode 100644 libavutil/riscv/asm.S diff --git a/libavutil/riscv/asm.S b/libavutil/riscv/asm.S new file mode 100644 index 00..7623c161cf --- /dev/

[FFmpeg-devel] [PATCH 3/3] lavc/audiodsp: add RISC-V F float vector clip

2022-09-14 Thread remi
From: Rémi Denis-Courmont RV64G supports MIN & MAX instructions natively only on floating point registers, not general purpose ones. The later would require the Zbb extension. Due to that, it is actually faster to perform the clipping "properly" in FPU. Benchmarked on SiFive U74-MC: audiodsp.ve

[FFmpeg-devel] [PATCH] lfg: fix comment typo

2022-09-15 Thread remi
From: Rémi Denis-Courmont --- libavutil/lfg.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavutil/lfg.h b/libavutil/lfg.h index 2b669205d1..9a1e277acd 100644 --- a/libavutil/lfg.h +++ b/libavutil/lfg.h @@ -27,7 +27,7 @@ /** * Context structure for the Lagged Fibonacc

[FFmpeg-devel] [PATCH] lavu/riscv: fix off-by-one in bit-magnitude clip

2022-09-15 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/intmath.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libavutil/riscv/intmath.h b/libavutil/riscv/intmath.h index 3263a79dc4..45bce9a0e7 100644 --- a/libavutil/riscv/intmath.h +++ b/libavutil/riscv/intmath.h @@ -61,8 +61,8

[FFmpeg-devel] [PATCH] lavc/fmtconvert: remove dead int32_to_float

2022-09-16 Thread remi
From: Rémi Denis-Courmont This is not used anywhere and has no implementations other than the plain C one. --- libavcodec/fmtconvert.c | 9 - libavcodec/fmtconvert.h | 10 -- 2 files changed, 19 deletions(-) diff --git a/libavcodec/fmtconvert.c b/libavcodec/fmtconvert.c index f

[FFmpeg-devel] [PATCHv2] lavc/fmtconvert: remove dead int32_to_float

2022-09-16 Thread remi
From: Rémi Denis-Courmont This is no longer used since 46089967722f74e794865a044f5f682f26628802. It also has no implementations other than the plain C one. --- libavcodec/fmtconvert.c | 9 - libavcodec/fmtconvert.h | 10 -- 2 files changed, 19 deletions(-) diff --git a/libavcod

[FFmpeg-devel] [PATCH 2/2] lavc/vorbisdec: use intermediate variables

2022-09-17 Thread remi
From: Rémi Denis-Courmont The compiler cannot infer that the two float vectors do not alias, causing unnecessary extra loads and serialisation. This patch caches the two input values in local variables so that compiler can optimise individual loop iterations. --- libavcodec/vorbisdec.c | 22

[FFmpeg-devel] [PATCH 1/2] lavc/vorbisdec: use ptrdiff_t to iterate over intptr_t

2022-09-17 Thread remi
From: Rémi Denis-Courmont While this probably never overflows, we are better safe than sorry. The callback prototype should probably also use ptrdiff_t or size_t but I diggress. --- libavcodec/vorbisdec.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavcodec/vorbisdec.

[FFmpeg-devel] [PATCH 1/6] lavu/cpu: detect RISC-V base extensions

2022-09-17 Thread remi
From: Rémi Denis-Courmont This introduces compile-time and run-time CPU detection on RISC-V. In practice, I doubt that FFmpeg will ever see a RISC-V CPU without all of I, F and D extensions, and if it does, it probably won't have run-time detection. So the flags are essentially always set. But a

[FFmpeg-devel] [PATCH 2/6] lavu/cpu: CPU flags for the RISC-V Vector extension

2022-09-17 Thread remi
From: Rémi Denis-Courmont RVV defines a total of 12 different extensions, including: - 5 different instruction subsets: - Zve32x: 8-, 16- and 32-bit integers, - Zve32f: Zve32x plus single precision floats, - Zve64x: Zve32x plus 64-bit integers, - Zve64f: Zve32f plus Zve64x, - Zve64d: Z

[FFmpeg-devel] [PATCH 3/6] configure: probe RISC-V Vector extension

2022-09-17 Thread remi
From: Rémi Denis-Courmont --- Makefile | 2 +- configure| 15 +++ ffbuild/arch.mak | 2 ++ 3 files changed, 18 insertions(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 61f79e27ae..1fb742f390 100644 --- a/Makefile +++ b/Makefile @@ -91,7 +91,7 @@ ffbuild/

[FFmpeg-devel] [PATCH 6/6] lavc/pixblockdsp: RISC-V scalar optimisations

2022-09-17 Thread remi
From: Rémi Denis-Courmont Benchmarks: get_pixels_c: 180.0 get_pixels_rvi: 136.7 --- libavcodec/pixblockdsp.c| 2 + libavcodec/pixblockdsp.h| 2 + libavcodec/riscv/Makefile | 2 + libavcodec/riscv/pixblockdsp_init.c | 43 ++ libavcodec/risc

[FFmpeg-devel] [PATCH 4/6] lavu/riscv: initial common header for assembler macros

2022-09-17 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/asm.S | 74 +++ 1 file changed, 74 insertions(+) create mode 100644 libavutil/riscv/asm.S diff --git a/libavutil/riscv/asm.S b/libavutil/riscv/asm.S new file mode 100644 index 00..7623c161cf --- /dev/

[FFmpeg-devel] [PATCH 5/6] lavc/audiodsp: add RISC-V F float vector clip

2022-09-17 Thread remi
From: Rémi Denis-Courmont RV64G supports MIN & MAX instructions natively only on floating point registers, not general purpose ones. The later would require the Zbb extension. Due to that, it is actually faster to perform the clipping "properly" in FPU. Benchmarked on SiFive U74-MC: audiodsp.ve

[FFmpeg-devel] [PATCHv2 2/2] lavc/vorbisdec: use intermediate variables

2022-09-19 Thread remi
From: Rémi Denis-Courmont The compiler cannot infer that the two float vectors do not alias, causing unnecessary extra loads and serialisation. This patch caches the two input values in local variables so that compiler can optimise individual loop iterations. --- libavcodec/vorbisdec.c | 24

[FFmpeg-devel] [PATCHv2 1/2] lavc/vorbisdec: use ptrdiff_t to iterate over intptr_t

2022-09-19 Thread remi
From: Rémi Denis-Courmont While this probably never overflows, we are better safe than sorry. The callback prototype should probably also use ptrdiff_t or size_t, but I diggress (this would affect the DSP callback prototype). --- libavcodec/vorbisdec.c | 3 +-- 1 file changed, 1 insertion(+), 2

[FFmpeg-devel] [PATCHv3 1/3] lavc/vorbisdec: use ptrdiff_t to iterate over intptr_t

2022-09-19 Thread remi
From: Rémi Denis-Courmont While this probably never overflows, we are better safe than sorry. The callback prototype should probably also use ptrdiff_t or size_t, but I diggress (this would affect the DSP callback prototype). --- libavcodec/ppc/vorbisdsp_altivec.c | 4 ++-- libavcodec/vorbisdec

[FFmpeg-devel] [PATCHv3 2/3] lavc/vorbisdsp: use ptrdiff_t rather than intptr_t

2022-09-19 Thread remi
From: Rémi Denis-Courmont ... for a difference between pointers. --- libavcodec/aarch64/vorbisdsp_init.c | 2 +- libavcodec/arm/vorbisdsp_init_arm.c | 2 +- libavcodec/ppc/vorbisdsp_altivec.c | 2 +- libavcodec/vorbis.h | 2 +- libavcodec/vorbisdec.c | 2 +- libavco

[FFmpeg-devel] [PATCHv3 3/3] lavc/vorbisdec: use intermediate variables

2022-09-19 Thread remi
From: Rémi Denis-Courmont The compiler cannot infer that the two float vectors do not alias, causing unnecessary extra loads and serialisation. This patch caches the two input values in local variables so that compiler can optimise individual loop iterations. --- libavcodec/vorbisdec.c | 24

[FFmpeg-devel] [PATCH 01/26] lavu/cpu: detect RISC-V base extensions

2022-09-20 Thread remi
From: Rémi Denis-Courmont This introduces compile-time and run-time CPU detection on RISC-V. In practice, I doubt that FFmpeg will ever see a RISC-V CPU without all of I, F and D extensions, and if it does, it probably won't have run-time detection. So the flags are essentially always set. But a

[FFmpeg-devel] [PATCH 02/26] lavu/riscv: initial common header for assembler macros

2022-09-20 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/asm.S | 77 +++ 1 file changed, 77 insertions(+) create mode 100644 libavutil/riscv/asm.S diff --git a/libavutil/riscv/asm.S b/libavutil/riscv/asm.S new file mode 100644 index 00..dbd97f40a4 --- /dev/

  1   2   3   >