Oh my, I forgot to change the subject of PATCH 0/2. That should have been "RISC-V: Constant FP Optimization with 'Zfa'", the same subject as PATCH 2/2.
Sorry for confusion! On 2023/08/14 14:32, Tsukasa OI wrote: > Hello, > > and... I think this might be my first *large* patch set for GCC > contribution and definitely the first one to touch the machine description. > > So, please review it carefully. > > > Background > =========== > > This patch set adds an optimization to FP constant initialization using a > FLI instruction, which is a part of the 'Zfa' extension which provides > additional floating-point instructions. > > FLI instructions ("fli.h" for binary16, "fli.s" for binary32, "fli.d" for > binary64 and "fli.q" for binary128 [which can be ignored because current > GCC for RISC-V does not natively support binary128]) provide an > load-immediate operation for following 32 immediates. > > | Binary Encoding | Immediate (and its part of binary representation) | > | --------------- | --------------------------------------------------| > | `00000` ( 0) | -1.0 (-0b1.00 * 2^(+ 0)) | > | `00001` ( 1) | Minimum positive normal value | > | | sign=[0] exponent=[0..01] significand=[000..000] | > | `00010` ( 2) | 1.00*2^(-16) (+0b1.00 * 2^(-16)) | > | `00011` ( 3) | 1.00*2^(-15) (+0b1.00 * 2^(-15)) | > | `00100` ( 4) | 1.00*2^(- 8) (+0b1.00 * 2^(- 8)) | > | `00101` ( 5) | 1.00*2^(- 7) (+0b1.00 * 2^(- 7)) | > | `00110` ( 6) | 1.00*2^(- 4) (+0b1.00 * 2^(- 4)) = 0.0625 | > | `00111` ( 7) | 1.00*2^(- 3) (+0b1.00 * 2^(- 3)) = 0.125 | > | `01000` ( 8) | 1.00*2^(- 2) (+0b1.00 * 2^(- 2)) : 0.25 | > | `01001` ( 9) | 1.25*2^(- 2) (+0b1.01 * 2^(- 2)) : 0.3125 | > | `01010` (10) | 1.50*2^(- 2) (+0b1.10 * 2^(- 2)) : 0.375 | > | `01011` (11) | 1.75*2^(- 2) (+0b1.11 * 2^(- 2)) : 0.4375 | > | `01100` (12) | 1.00*2^(- 1) (+0b1.00 * 2^(- 1)) : 0.5 | > | `01101` (13) | 1.25*2^(- 1) (+0b1.01 * 2^(- 1)) : 0.625 | > | `01110` (14) | 1.50*2^(- 1) (+0b1.10 * 2^(- 1)) : 0.75 | > | `01111` (15) | 1.75*2^(- 1) (+0b1.11 * 2^(- 1)) : 0.875 | > | `10000` (16) | 1.00*2^(+ 0) (+0b1.00 * 2^(+ 0)) : 1.0 | > | `10001` (17) | 1.25*2^(+ 0) (+0b1.01 * 2^(+ 0)) : 1.25 | > | `10010` (18) | 1.50*2^(+ 0) (+0b1.10 * 2^(+ 0)) : 1.5 | > | `10011` (19) | 1.75*2^(+ 0) (+0b1.11 * 2^(+ 0)) : 1.75 | > | `10100` (20) | 1.00*2^(+ 1) (+0b1.00 * 2^(+ 1)) : 2.0 | > | `10101` (21) | 1.25*2^(+ 1) (+0b1.01 * 2^(+ 1)) : 2.5 | > | `10110` (22) | 1.50*2^(+ 1) (+0b1.10 * 2^(+ 1)) : 3.0 | > | `10111` (23) | 1.00*2^(+ 2) (+0b1.00 * 2^(+ 2)) = 4 | > | `11000` (24) | 1.00*2^(+ 3) (+0b1.00 * 2^(+ 3)) = 8 | > | `11001` (25) | 1.00*2^(+ 4) (+0b1.00 * 2^(+ 4)) = 16 | > | `11010` (26) | 1.00*2^(+ 7) (+0b1.00 * 2^(+ 7)) = 128 | > | `11011` (27) | 1.00*2^(+ 8) (+0b1.00 * 2^(+ 8)) = 256 | > | `11100` (28) | 1.00*2^(+15) (+0b1.00 * 2^(+15)) = 32768 | > | `11101` (29) | 1.00*2^(+16) (+0b1.00 * 2^(+16)) = 65536 | > | | On "fli.h", this is equivalent to positive inf. | > | `11110` (30) | Positive infinity | > | | sign=[0] exponent=[1..11] significand=[000..000] | > | `11111` (31) | Canonical NaN (positive, quiet and zero payload) | > | | sign=[0] exponent=[1..11] significand=[100..000] | > > Currently, initializing a FP constant (except zero) involves memory and its > use can be reduced by FLI instructions. > > We may have a room to generate much complex constants with multiple FLI > instructions (e.g. like long integer constants) but for starter, we can > begin with optimizing one FP constant initialization with one FLI > instruction (and because FP arithmetic often requires larger latency, > benefits of making multiple FLI sequence is not high compared to integers). > > > FLI FP constant checking > ========================= > > An instruction with a similar role to RISC-V's FLI instructions is the Arm/ > AArch64's vmov.f32 instruction. It provides a load-immediate operation for > constant that can be represented in the following form: > >> (-1)^s * 0b1.xxxx * 2^r (where -3 <= r <= +4; fits in 3-bits) > > This patch is largely influenced by AArch64's handling but > compared to this, handling RISC-V's FLI FP constant can be a little tricky. > > * FLI normally generates only values with sign bit 0 except the binary > encoding 0 (which loads -1.0 with sign bit 1). > * Not only finite values, FLI can generate positive infinity and > canonical NaN. > * Because FLI can generate canonical NaN, handling NaN is preferred but > FLI only generates canonical NaN. Since we can easily create a non- > canonical NaN with __builtin_nan ("[PAYLOAD]") and that could be a > direct return value of a function, we must reject non-canonical NaNs > (otherwise it'll generate "fli.d fa0,nan" where NaN is non-canonical). > * Exponent range and mantissa constraint is a bit tricky. > On binary encodings 8-22, it looks like 0b1.xx * 2^r (where -2 <= 1) > but we have to explicitly reject 0b1.11 * 2^1 (that is 3.5) because > the value 3.5 is not in the list. > Other 1.00 * 2^r values have discontinuous r. > * Binary encoding 1 (minimum positive normal value for corresponding > type) depends on the type (or mode) we are on. > * Assembler accepts three string operands: "min", "inf" and "nan". > > Handling those like aarch64_float_const_representable_p can be > inefficient. So, I implemented riscv_get_float_fli_const function which > returns complex information about a FLI constant (including whether the > constant is valid for a FLI constant). > > This complex information contains: > > 1. Validness > 2. Sign bit (only set for -1.0) > 3. FLI constant type ("min", "inf", "nan" or a finite number but "min") > 4. Highest two bits of mantissa under the point (xx for 0b1.xx) > on a finite value except "min". > 5. Biased exponent (yet sparse representation to make handling easier) > on a finite value except "min". For 0b1.xx * 2^r, (r+16) is stored. > Valid range of this is [0, 32] (inclusive) so it requires 6 bits. > > On many ABIs, those information is packed into an integer sized bitfield. > > > New Constraint: "H" > ==================== > > According to the GCC Internals documentation, (along with "G") "H" is > preferred for a machine-dependent fashion to permit immediate floating > operands in particular ranges of values. Because "G" is already used to > represent +0.0, this patch set uses "H" for FLI-capable FP constants. > > It adds one variant per operation: > > * movhf_hardfloat > * movsf_hardfloat > * movdf_hardfloat_rv32 > * movdf_hardfloat_rv64 > > Note that the 'Zfa' extension requires the 'F' extension (which is the > hard float). > > > > Portions that I'm not sure whether they are okay > ================================================= > > * NaN handling (comparison with canonical NaN) > Due to constraints, I had to compare a NaN with known binary > representations with known IEEE 754 binary16/32/64's canonical NaN but > it there any better way to perform this? > * Any ICE possibility? > For simple programs, I confirmed that no ICE occurs but I'm not sure > whether this applies to other programs. If I miss some cases in > riscv_output_move or riscv_print_operand functions (corresponding > mov instructions in riscv.md), it can easily cause an ICE. > > > Sincerely, > Tsukasa > > > > > Tsukasa OI (2): > RISC-V: Add support for the 'Zfa' extension > RISC-V: Constant FP Optimization with 'Zfa' > > gcc/common/config/riscv/riscv-common.cc | 3 + > gcc/config/riscv/constraints.md | 7 + > gcc/config/riscv/riscv-opts.h | 2 + > gcc/config/riscv/riscv-protos.h | 34 +++ > gcc/config/riscv/riscv.cc | 250 ++++++++++++++++++++- > gcc/config/riscv/riscv.md | 24 +- > gcc/testsuite/gcc.target/riscv/zfa-fli-1.c | 24 ++ > gcc/testsuite/gcc.target/riscv/zfa-fli-2.c | 24 ++ > gcc/testsuite/gcc.target/riscv/zfa-fli-3.c | 14 ++ > gcc/testsuite/gcc.target/riscv/zfa-fli-4.c | 111 +++++++++ > gcc/testsuite/gcc.target/riscv/zfa-fli-5.c | 98 ++++++++ > gcc/testsuite/gcc.target/riscv/zfa-fli-6.c | 61 +++++ > gcc/testsuite/gcc.target/riscv/zfa-fli-7.c | 30 +++ > gcc/testsuite/gcc.target/riscv/zfa-fli-8.c | 39 ++++ > 14 files changed, 697 insertions(+), 24 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-1.c > create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-2.c > create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-3.c > create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-4.c > create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-5.c > create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-6.c > create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-7.c > create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-8.c > > > base-commit: 614052dd4ea083e086712809c754ffebd9361316