Re: [PATCH 0/2] RISC-V: Constant FP Optimization with 'Zfa'

Tsukasa OI via Gcc-patches Sun, 13 Aug 2023 23:19:50 -0700

Oh my, I forgot to change the subject of PATCH 0/2.
That should have been "RISC-V: Constant FP Optimization with 'Zfa'", the
same subject as PATCH 2/2.


Sorry for confusion!

On 2023/08/14 14:32, Tsukasa OI wrote:
> Hello,
> 
> and... I think this might be my first *large* patch set for GCC
> contribution and definitely the first one to touch the machine description.
> 
> So, please review it carefully.
> 
> 
> Background
> ===========
> 
> This patch set adds an optimization to FP constant initialization using a
> FLI instruction, which is a part of the 'Zfa' extension which provides
> additional floating-point instructions.
> 
> FLI instructions ("fli.h" for binary16, "fli.s" for binary32, "fli.d" for
> binary64 and "fli.q" for binary128 [which can be ignored because current
> GCC for RISC-V does not natively support binary128]) provide an
> load-immediate operation for following 32 immediates.
> 
> | Binary Encoding | Immediate (and its part of binary representation) |
> | --------------- | --------------------------------------------------|
> |    `00000` ( 0) | -1.0          (-0b1.00 * 2^(+ 0))                 |
> |    `00001` ( 1) | Minimum positive normal value                     |
> |                 | sign=[0] exponent=[0..01] significand=[000..000]  |
> |    `00010` ( 2) | 1.00*2^(-16)  (+0b1.00 * 2^(-16))                 |
> |    `00011` ( 3) | 1.00*2^(-15)  (+0b1.00 * 2^(-15))                 |
> |    `00100` ( 4) | 1.00*2^(- 8)  (+0b1.00 * 2^(- 8))                 |
> |    `00101` ( 5) | 1.00*2^(- 7)  (+0b1.00 * 2^(- 7))                 |
> |    `00110` ( 6) | 1.00*2^(- 4)  (+0b1.00 * 2^(- 4)) = 0.0625        |
> |    `00111` ( 7) | 1.00*2^(- 3)  (+0b1.00 * 2^(- 3)) = 0.125         |
> |    `01000` ( 8) | 1.00*2^(- 2)  (+0b1.00 * 2^(- 2)) : 0.25          |
> |    `01001` ( 9) | 1.25*2^(- 2)  (+0b1.01 * 2^(- 2)) : 0.3125        |
> |    `01010` (10) | 1.50*2^(- 2)  (+0b1.10 * 2^(- 2)) : 0.375         |
> |    `01011` (11) | 1.75*2^(- 2)  (+0b1.11 * 2^(- 2)) : 0.4375        |
> |    `01100` (12) | 1.00*2^(- 1)  (+0b1.00 * 2^(- 1)) : 0.5           |
> |    `01101` (13) | 1.25*2^(- 1)  (+0b1.01 * 2^(- 1)) : 0.625         |
> |    `01110` (14) | 1.50*2^(- 1)  (+0b1.10 * 2^(- 1)) : 0.75          |
> |    `01111` (15) | 1.75*2^(- 1)  (+0b1.11 * 2^(- 1)) : 0.875         |
> |    `10000` (16) | 1.00*2^(+ 0)  (+0b1.00 * 2^(+ 0)) : 1.0           |
> |    `10001` (17) | 1.25*2^(+ 0)  (+0b1.01 * 2^(+ 0)) : 1.25          |
> |    `10010` (18) | 1.50*2^(+ 0)  (+0b1.10 * 2^(+ 0)) : 1.5           |
> |    `10011` (19) | 1.75*2^(+ 0)  (+0b1.11 * 2^(+ 0)) : 1.75          |
> |    `10100` (20) | 1.00*2^(+ 1)  (+0b1.00 * 2^(+ 1)) : 2.0           |
> |    `10101` (21) | 1.25*2^(+ 1)  (+0b1.01 * 2^(+ 1)) : 2.5           |
> |    `10110` (22) | 1.50*2^(+ 1)  (+0b1.10 * 2^(+ 1)) : 3.0           |
> |    `10111` (23) | 1.00*2^(+ 2)  (+0b1.00 * 2^(+ 2)) = 4             |
> |    `11000` (24) | 1.00*2^(+ 3)  (+0b1.00 * 2^(+ 3)) = 8             |
> |    `11001` (25) | 1.00*2^(+ 4)  (+0b1.00 * 2^(+ 4)) = 16            |
> |    `11010` (26) | 1.00*2^(+ 7)  (+0b1.00 * 2^(+ 7)) = 128           |
> |    `11011` (27) | 1.00*2^(+ 8)  (+0b1.00 * 2^(+ 8)) = 256           |
> |    `11100` (28) | 1.00*2^(+15)  (+0b1.00 * 2^(+15)) = 32768         |
> |    `11101` (29) | 1.00*2^(+16)  (+0b1.00 * 2^(+16)) = 65536         |
> |                 | On "fli.h", this is equivalent to positive inf.   |
> |    `11110` (30) | Positive infinity                                 |
> |                 | sign=[0] exponent=[1..11] significand=[000..000]  |
> |    `11111` (31) | Canonical NaN (positive, quiet and zero payload)  |
> |                 | sign=[0] exponent=[1..11] significand=[100..000]  |
> 
> Currently, initializing a FP constant (except zero) involves memory and its
> use can be reduced by FLI instructions.
> 
> We may have a room to generate much complex constants with multiple FLI
> instructions (e.g. like long integer constants) but for starter, we can
> begin with optimizing one FP constant initialization with one FLI
> instruction (and because FP arithmetic often requires larger latency,
> benefits of making multiple FLI sequence is not high compared to integers).
> 
> 
> FLI FP constant checking
> =========================
> 
> An instruction with a similar role to RISC-V's FLI instructions is the Arm/
> AArch64's vmov.f32 instruction. It provides a load-immediate operation for
> constant that can be represented in the following form:
> 
>> (-1)^s * 0b1.xxxx * 2^r   (where -3 <= r <= +4; fits in 3-bits)
> 
> This patch is largely influenced by AArch64's handling but
> compared to this, handling RISC-V's FLI FP constant can be a little tricky.
> 
> *   FLI normally generates only values with sign bit 0 except the binary
>     encoding 0 (which loads -1.0 with sign bit 1).
> *   Not only finite values, FLI can generate positive infinity and
>     canonical NaN.
> *   Because FLI can generate canonical NaN, handling NaN is preferred but
>     FLI only generates canonical NaN.  Since we can easily create a non-
>     canonical NaN with __builtin_nan ("[PAYLOAD]") and that could be a
>     direct return value of a function, we must reject non-canonical NaNs
>     (otherwise it'll generate "fli.d fa0,nan" where NaN is non-canonical).
> *   Exponent range and mantissa constraint is a bit tricky.
>     On binary encodings 8-22, it looks like 0b1.xx * 2^r (where -2 <= 1)
>     but we have to explicitly reject 0b1.11 * 2^1 (that is 3.5) because
>     the value 3.5 is not in the list.
>     Other 1.00 * 2^r values have discontinuous r.
> *   Binary encoding 1 (minimum positive normal value for corresponding
>     type) depends on the type (or mode) we are on.
> *   Assembler accepts three string operands: "min", "inf" and "nan".
> 
> Handling those like aarch64_float_const_representable_p can be
> inefficient.  So, I implemented riscv_get_float_fli_const function which
> returns complex information about a FLI constant (including whether the
> constant is valid for a FLI constant).
> 
> This complex information contains:
> 
> 1.  Validness
> 2.  Sign bit (only set for -1.0)
> 3.  FLI constant type ("min", "inf", "nan" or a finite number but "min")
> 4.  Highest two bits of mantissa under the point (xx for 0b1.xx)
>     on a finite value except "min".
> 5.  Biased exponent (yet sparse representation to make handling easier)
>     on a finite value except "min".  For 0b1.xx * 2^r, (r+16) is stored.
>     Valid range of this is [0, 32] (inclusive) so it requires 6 bits.
> 
> On many ABIs, those information is packed into an integer sized bitfield.
> 
> 
> New Constraint: "H"
> ====================
> 
> According to the GCC Internals documentation, (along with "G") "H" is
> preferred for a machine-dependent fashion to permit immediate floating
> operands in particular ranges of values.  Because "G" is already used to
> represent +0.0, this patch set uses "H" for FLI-capable FP constants.
> 
> It adds one variant per operation:
> 
> *   movhf_hardfloat
> *   movsf_hardfloat
> *   movdf_hardfloat_rv32
> *   movdf_hardfloat_rv64
> 
> Note that the 'Zfa' extension requires the 'F' extension (which is the
> hard float).
> 
> 
> 
> Portions that I'm not sure whether they are okay
> =================================================
> 
> *   NaN handling (comparison with canonical NaN)
>     Due to constraints, I had to compare a NaN with known binary
>     representations with known IEEE 754 binary16/32/64's canonical NaN but
>     it there any better way to perform this?
> *   Any ICE possibility?
>     For simple programs, I confirmed that no ICE occurs but I'm not sure
>     whether this applies to other programs.  If I miss some cases in
>     riscv_output_move or riscv_print_operand functions (corresponding
>     mov instructions in riscv.md), it can easily cause an ICE.
> 
> 
> Sincerely,
> Tsukasa
> 
> 
> 
> 
> Tsukasa OI (2):
>   RISC-V: Add support for the 'Zfa' extension
>   RISC-V: Constant FP Optimization with 'Zfa'
> 
>  gcc/common/config/riscv/riscv-common.cc    |   3 +
>  gcc/config/riscv/constraints.md            |   7 +
>  gcc/config/riscv/riscv-opts.h              |   2 +
>  gcc/config/riscv/riscv-protos.h            |  34 +++
>  gcc/config/riscv/riscv.cc                  | 250 ++++++++++++++++++++-
>  gcc/config/riscv/riscv.md                  |  24 +-
>  gcc/testsuite/gcc.target/riscv/zfa-fli-1.c |  24 ++
>  gcc/testsuite/gcc.target/riscv/zfa-fli-2.c |  24 ++
>  gcc/testsuite/gcc.target/riscv/zfa-fli-3.c |  14 ++
>  gcc/testsuite/gcc.target/riscv/zfa-fli-4.c | 111 +++++++++
>  gcc/testsuite/gcc.target/riscv/zfa-fli-5.c |  98 ++++++++
>  gcc/testsuite/gcc.target/riscv/zfa-fli-6.c |  61 +++++
>  gcc/testsuite/gcc.target/riscv/zfa-fli-7.c |  30 +++
>  gcc/testsuite/gcc.target/riscv/zfa-fli-8.c |  39 ++++
>  14 files changed, 697 insertions(+), 24 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-3.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-4.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-5.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-6.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-7.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-8.c
> 
> 
> base-commit: 614052dd4ea083e086712809c754ffebd9361316

Re: [PATCH 0/2] RISC-V: Constant FP Optimization with 'Zfa'

Reply via email to