On 8/3/23 13:37, Mariam Harutyunyan via Gcc-patches wrote:
This patch adds CRC support for the RISC-V architecture. It adds internal functions and built-ins specifically designed to handle CRC computations efficiently. If the target is ZBC, the clmul instruction is used for the CRC code generation; otherwise, table-based CRC is generated. A table with 256 elements is used to store precomputed CRCs. These CRC calculation algorithms have higher performance than the naive CRC calculation algorithm.
[ ... ] Various comments attached.
From 9d2e9023c222501a1d9519bea3d5cdbd32b5a91e Mon Sep 17 00:00:00 2001 From: Mariam Arutunian <mariamarutun...@gmail.com> Date: Thu, 3 Aug 2023 15:59:57 +0400 Subject: [PATCH] RISC-V: Added support for CRC. If the target is ZBC, then the clmul instruction is used for the CRC code generation; otherwise, table-based CRC is generated. A table with 256 elements is used to store precomputed CRCs. gcc/ChangeLog: *builtin-types.def (BT_FN_UINT8_UINT8_UINT8_CONST_SIZE): Define. (BT_FN_UINT16_UINT16_UINT8_CONST_SIZE): Likewise. (BT_FN_UINT16_UINT16_UINT16_CONST_SIZE): Likewise. (BT_FN_UINT32_UINT32_UINT8_CONST_SIZE): Likewise. (BT_FN_UINT32_UINT32_UINT16_CONST_SIZE): Likewise. (BT_FN_UINT32_UINT32_UINT32_CONST_SIZE): Likewise. (BT_FN_UINT64_UINT64_UINT8_CONST_SIZE): Likewise. (BT_FN_UINT64_UINT64_UINT16_CONST_SIZE): Likewise. (BT_FN_UINT64_UINT64_UINT32_CONST_SIZE): Likewise. (BT_FN_UINT64_UINT64_UINT32_CONST_SIZE): Likewise. * builtins.cc (associated_internal_fn): Handle BUILT_IN_CRC8_DATA8, BUILT_IN_CRC16_DATA8, BUILT_IN_CRC16_DATA16, BUILT_IN_CRC32_DATA8, BUILT_IN_CRC32_DATA16, BUILT_IN_CRC32_DATA32, BUILT_IN_CRC64_DATA8, BUILT_IN_CRC64_DATA16, BUILT_IN_CRC64_DATA32, BUILT_IN_CRC64_DATA64. * builtins.def (BUILT_IN_CRC8_DATA8): New builtin. (BUILT_IN_CRC16_DATA8): Likewise. (BUILT_IN_CRC16_DATA16): Likewise. (BUILT_IN_CRC32_DATA8): Likewise. (BUILT_IN_CRC32_DATA16): Likewise. (BUILT_IN_CRC32_DATA32): Likewise. (BUILT_IN_CRC64_DATA8): Likewise. (BUILT_IN_CRC64_DATA16): Likewise. (BUILT_IN_CRC64_DATA32): Likewise. (BUILT_IN_CRC64_DATA64): Likewise. * config/riscv/bitmanip.md (crc<ANYI2:mode><ANYI:mode>4): New expander. * config/riscv/riscv-protos.h (expand_crc_table_based): Declare. (expand_crc_using_clmul): Likewise. * config/riscv/riscv.cc (gf2n_poly_long_div_quotient): New function. (generate_crc): Likewise. (generate_crc_table): Likewise. (expand_crc_table_based): Likewise. (expand_crc_using_clmul): Likewise. * config/riscv/riscv.md (UNSPEC_CRC): New unspec for CRC. * internal-fn.cc (crc_direct): Define. (expand_crc_optab_fn): New function. (direct_crc_optab_supported_p): Define. * internal-fn.def (CRC): New internal optab function. * optabs.def (crc_optab): New optab. gcc/testsuite/ChangeLog: * gcc.target/riscv/crc-builtin-table-target32.c: New test. * gcc.target/riscv/crc-builtin-table-target64.c: New test. * gcc.target/riscv/crc-builtin-zbc32.c: New test. * gcc.target/riscv/crc-builtin-zbc64.c: New test. --- diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 2eab466a9f8..748d8be384b 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog In general, the ChangeLog entry is just sent like you've done above and not included as a diff. diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def index 43381bc8949..e33837c27d0 100644 --- a/gcc/builtin-types.def +++ b/gcc/builtin-types.def @@ -829,6 +829,26 @@ DEF_FUNCTION_TYPE_3 (BT_FN_PTR_SIZE_SIZE_PTRMODE, BT_PTR, BT_SIZE, BT_SIZE, BT_PTRMODE) DEF_FUNCTION_TYPE_3 (BT_FN_VOID_PTR_UINT8_PTRMODE, BT_VOID, BT_PTR, BT_UINT8, BT_PTRMODE) +DEF_FUNCTION_TYPE_3 (BT_FN_UINT8_UINT8_UINT8_CONST_SIZE, BT_UINT8, BT_UINT8, + BT_UINT8, BT_CONST_SIZE) [ ... ] Presumably the reason we need to many variants is due to the desire to support various types for the inputs and outputs? diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md index c42e7b890db..4c896303242 100644 --- a/gcc/config/riscv/bitmanip.md +++ b/gcc/config/riscv/bitmanip.md @@ -856,3 +856,38 @@ "TARGET_ZBC" "clmulr\t%0,%1,%2" [(set_attr "type" "clmul")]) + +;; Iterator for hardware-supported integer modes, same as ANYI +(define_mode_iterator ANYI2 [QI HI SI (DI "TARGET_64BIT")]) + +;; CRC 8, 16, 32, (64 for TARGET_64) +(define_expand "crc<ANYI2:mode><ANYI:mode>4" + ;; return value (calculated CRC) + [(set (match_operand:ANYI 0 "register_operand" "=r") + ;; initial CRC + (unspec:ANYI [(match_operand:ANYI 1 "register_operand" "r") + ;; data + (match_operand:ANYI2 2 "register_operand" "r") + ;; polynomial + (match_operand:ANYI 3)] + UNSPEC_CRC))] I'm not real comfortable with directly supporting sub-word sizes for the output operand. The vast majority of insns on the risc-v port have outputs that use either the X iterator (which maps to either SI or DI mode for rv32 and rv64 respectively) or they use the GPR iterator. I don't think many us ANYI. Which ultimately makes sense since operations actually write X mode bits. Presumably the goal here is to capture cases where the resultant CRC value is a sub-word size. +"" +{ + /* TODO: We only support cases where the data size is not greater + than the CRC size. */ + if (GET_MODE (operands[0]) >= GET_MODE (operands[2])) + { + /* If we have the ZBC extension (ie, clmul) and + it is possible to store the quotient within a single variable + (E.g. CRC64's quotient may need 65 bits, + we can't keep it in 64 bit variable.) + then use clmul instruction to implement the CRC, + otherwise generate table-based CRC. */ + if (TARGET_ZBC && ((TARGET_64BIT && GET_MODE (operands[0]) != DImode) + || (!TARGET_64BIT && GET_MODE (operands[0]) < SImode))) Formatting NIT. We'd tend to prefer to write this as if (TARGET_ZBC && ((TARGET_64BIT && GET_MODE (operands[0]) != DImode) || (!TARGET_64BIT && GET_MODE (operands[0]) < SImode))) But I think the condition you're really looking for is if (TARGET_ZBC && GET_MODE (operands[0]) == word_mode) Meaning the output is 32bits for rv32 or 64bits for rv64. If we fix the mode in the expander pattern to use X, then this condition would simplify to just (TARGET_ZBC) since the output would be known to be word sized already. + expand_crc_using_clmul (operands); + else + expand_crc_table_based (operands); + } + DONE; +}) \ No newline at end of file diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index c9520f689e2..35bf19806df 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -131,6 +131,8 @@ extern bool riscv_shamt_matches_mask_p (int, HOST_WIDE_INT); extern void riscv_subword_address (rtx, rtx *, rtx *, rtx *, rtx *); extern void riscv_lshift_subword (machine_mode, rtx, rtx, rtx *); extern enum memmodel riscv_union_memmodels (enum memmodel, enum memmodel); +extern void expand_crc_table_based (rtx *); +extern void expand_crc_using_clmul (rtx *); I wonder if all the table based generation support code should move into a generic file so that it could be used by other targets. I don't offhand see anything in there that is RISC-V specific. /* Routines implemented in riscv-c.cc. */ void riscv_cpu_cpp_builtins (cpp_reader *); diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 332fa720f01..e15850910a2 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc +/* Generates CRC lookup table by calculating CRC for all possible + 8-bit data values. The table is stored with a specific name in the read-only + data section. */ + +rtx +generate_crc_table (unsigned HOST_WIDE_INT polynom, unsigned crc_bits) +{ + gcc_assert (crc_bits <= 64); + FILE *out = asm_out_file; + + /* Buf size - 33 letters + + 18 for numbers (2 for crc bit size + 2 for 0x + 16 for 64 bit polynomial) + + 1 for \0. */ + char buf[33 + 20 + 1]; + sprintf (buf, "crc_table_for_%u_bit_crc_%llx_polynomial", crc_bits, polynom); + tree id = maybe_get_identifier (buf); + if (id) + return gen_rtx_SYMBOL_REF (Pmode, IDENTIFIER_POINTER (id)); + id = get_identifier (buf); So I think we need to be more careful here with namespace pollution. Ideally I think we'd want this symbol to be static and const. + rtx tab = gen_rtx_SYMBOL_REF (Pmode, IDENTIFIER_POINTER (id)); + + /* Create a table with 256 elements. */ + unsigned table_elements_n = 0x100; + char val_align_op[7]; + if (crc_bits <= 8) + sprintf (val_align_op, ".byte"); + else if (crc_bits <= 16) + sprintf (val_align_op, ".half"); + else if (crc_bits <= 32) + sprintf (val_align_op, ".word"); + else + sprintf (val_align_op, ".dword"); There are standard target hooks for emitting data and a routine that given the size the size of an object will return the right assembler directive. Look for integer_asm_op. More generally, look in varasm.cc for various routines dealing with generating assembler output. + + /* Write in read only data section. */ There are also standard ways to switch sections and standard section names based on the target. Again, varasm.cc will be the right place to wander. Note that if you were to create a DECL node for the table and attach a suitable DECL_INITIAL to it with its initialization data I think most, if not all of the assembly output complexity would be handled for you automatically. If that is unwieldly to implement, then you'll definitely want to be using various helpers from varasm.cc. + for (int i = 0; i < GET_MODE_SIZE (data_mode).to_constant (); i++) + { + /* crc >> (bit_size - 8). */ + rtx op1 = gen_rtx_ASHIFTRT (crc_mode, crc, + GEN_INT (mode_bit_size - 8)); In general, I'd suggest gen_int_mode rather than GEN_INT. + /* crc_mode is the return value's mode, + depends on CRC function's CRC size. */ + if (crc_mode != word_mode) + ix = gen_rtx_ZERO_EXTEND (word_mode, ix); + ix = gen_rtx_ASHIFT (word_mode, ix, GEN_INT (exact_log2 ( + GET_MODE_SIZE (crc_mode).to_constant ()))); + ix = force_reg (word_mode, ix); Hmm, I'm not sure we can depend on the target actually supporting ZERO_EXTEND. Some targets use AND with suitable immediates. Some might use shifts, etc. + + /* crc_table[(crc >> (bit_size - 8)) ^ data_8bit] */ + rtx tab_el = gen_rtx_MEM (crc_mode, gen_rtx_PLUS (word_mode, ix, tab)); Similarly, don't we have to worry about whether or not the resulting address is actually valid? They probably always are on RISC-V as I don't think TAB will require more than 12 bits of offsets to address. But some targets can only do simple register indirect addressing. + + /* (crc << 8) */ + rtx high = gen_rtx_ASHIFT (crc_mode, crc, GEN_INT (8)); And here I don't think we can necessarily depend on the target supporting shifts by a constant amount. There's certainly targets out there that can only shift one or two positions at a time. Those targets typically lie and claim they have more general shifting capabilities, but the point is we need to be much more careful about the RTL we generate. + if (crc_mode != word_mode) + high = force_reg (crc_mode, + gen_rtx_AND (crc_mode, high, + GEN_INT (GET_MODE_MASK (crc_mode)))); Similarly here, some targets may not support AND with immediate or may not have enough bits to handle all the immediates that can arise here. + /* crc = (crc << 8) ^ crc_table[(crc >> (bit_size - 8)) ^ data_8bit]; */ + crc = force_reg (crc_mode, gen_rtx_XOR (crc_mode, tab_el, high)); + } + riscv_emit_move (operands[0], crc); CRC is a REG. Isn't operands[0] also a REG? Can we just use emit_move_insn here rather than riscv_emit_move? Conceptually this code should e target independent, so embedding risc-v isms in here will make it harder to use on other targets. @@ -3996,6 +4028,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types, #define direct_cond_len_unary_optab_supported_p direct_optab_supported_p #define direct_cond_len_binary_optab_supported_p direct_optab_supported_p #define direct_cond_len_ternary_optab_supported_p direct_optab_supported_p +#define direct_crc_optab_supported_p convert_optab_supported_p So did we ever figure out why we're treating the CRC optab as a conversion optab?