Re: RISC-V: Added support for CRC.

Jeff Law via Gcc-patches Tue, 15 Aug 2023 22:00:18 -0700


On 8/3/23 13:37, Mariam Harutyunyan via Gcc-patches wrote:

This patch adds CRC support for the RISC-V architecture. It adds internal
functions and built-ins specifically designed to handle CRC computations
efficiently.

If the target is ZBC, the clmul instruction is used for the CRC code
generation; otherwise, table-based CRC is generated.  A table with 256
elements is used to store precomputed CRCs.

These CRC calculation algorithms have higher performance than the naive CRC
calculation algorithm.

[ ... ]
Various comments attached.

From 9d2e9023c222501a1d9519bea3d5cdbd32b5a91e Mon Sep 17 00:00:00 2001
From: Mariam Arutunian <mariamarutun...@gmail.com>
Date: Thu, 3 Aug 2023 15:59:57 +0400
Subject: [PATCH] RISC-V: Added support for CRC.

  If the target is ZBC, then the clmul instruction is used for the CRC code generation;
  otherwise, table-based CRC is generated. A table with 256 elements is used to store precomputed CRCs.

  gcc/ChangeLog:
	*builtin-types.def (BT_FN_UINT8_UINT8_UINT8_CONST_SIZE): Define.
	(BT_FN_UINT16_UINT16_UINT8_CONST_SIZE): Likewise.
	(BT_FN_UINT16_UINT16_UINT16_CONST_SIZE): Likewise.
	(BT_FN_UINT32_UINT32_UINT8_CONST_SIZE): Likewise.
	(BT_FN_UINT32_UINT32_UINT16_CONST_SIZE): Likewise.
	(BT_FN_UINT32_UINT32_UINT32_CONST_SIZE): Likewise.
	(BT_FN_UINT64_UINT64_UINT8_CONST_SIZE): Likewise.
	(BT_FN_UINT64_UINT64_UINT16_CONST_SIZE): Likewise.
	(BT_FN_UINT64_UINT64_UINT32_CONST_SIZE): Likewise.
	(BT_FN_UINT64_UINT64_UINT32_CONST_SIZE): Likewise.
	* builtins.cc (associated_internal_fn): Handle BUILT_IN_CRC8_DATA8,
	BUILT_IN_CRC16_DATA8, BUILT_IN_CRC16_DATA16,
	BUILT_IN_CRC32_DATA8, BUILT_IN_CRC32_DATA16, BUILT_IN_CRC32_DATA32,
	BUILT_IN_CRC64_DATA8, BUILT_IN_CRC64_DATA16, BUILT_IN_CRC64_DATA32,
	BUILT_IN_CRC64_DATA64.
	* builtins.def (BUILT_IN_CRC8_DATA8): New builtin.
	(BUILT_IN_CRC16_DATA8): Likewise.
	(BUILT_IN_CRC16_DATA16): Likewise.
	(BUILT_IN_CRC32_DATA8): Likewise.
	(BUILT_IN_CRC32_DATA16): Likewise.
	(BUILT_IN_CRC32_DATA32): Likewise.
	(BUILT_IN_CRC64_DATA8): Likewise.
	(BUILT_IN_CRC64_DATA16): Likewise.
	(BUILT_IN_CRC64_DATA32): Likewise.
	(BUILT_IN_CRC64_DATA64): Likewise.
	* config/riscv/bitmanip.md (crc<ANYI2:mode><ANYI:mode>4): New expander.
	* config/riscv/riscv-protos.h (expand_crc_table_based): Declare.
	(expand_crc_using_clmul): Likewise.
	* config/riscv/riscv.cc (gf2n_poly_long_div_quotient): New function.
	(generate_crc): Likewise.
	(generate_crc_table): Likewise.
	(expand_crc_table_based): Likewise.
	(expand_crc_using_clmul): Likewise.
	* config/riscv/riscv.md (UNSPEC_CRC): New unspec for CRC.
	* internal-fn.cc (crc_direct): Define.
	(expand_crc_optab_fn): New function.
	(direct_crc_optab_supported_p): Define.
	* internal-fn.def (CRC): New internal optab function.
	* optabs.def (crc_optab): New optab.

  gcc/testsuite/ChangeLog:
	* gcc.target/riscv/crc-builtin-table-target32.c: New test.
	* gcc.target/riscv/crc-builtin-table-target64.c: New test.
	* gcc.target/riscv/crc-builtin-zbc32.c: New test.
	* gcc.target/riscv/crc-builtin-zbc64.c: New test.
---
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 2eab466a9f8..748d8be384b 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
In general, the ChangeLog entry is just sent like you've done above and
not included as a diff.


diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index 43381bc8949..e33837c27d0 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -829,6 +829,26 @@ DEF_FUNCTION_TYPE_3 (BT_FN_PTR_SIZE_SIZE_PTRMODE,
 		     BT_PTR, BT_SIZE, BT_SIZE, BT_PTRMODE)
 DEF_FUNCTION_TYPE_3 (BT_FN_VOID_PTR_UINT8_PTRMODE, BT_VOID, BT_PTR, BT_UINT8,
 		     BT_PTRMODE)
+DEF_FUNCTION_TYPE_3 (BT_FN_UINT8_UINT8_UINT8_CONST_SIZE, BT_UINT8, BT_UINT8,
+		     BT_UINT8, BT_CONST_SIZE)
[ ... ]
Presumably the reason we need to many variants is due to the desire to
support various types for the inputs and outputs?


diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index c42e7b890db..4c896303242 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -856,3 +856,38 @@
   "TARGET_ZBC"
   "clmulr\t%0,%1,%2"
   [(set_attr "type" "clmul")])
+
+;; Iterator for hardware-supported integer modes, same as ANYI
+(define_mode_iterator ANYI2 [QI HI SI (DI "TARGET_64BIT")])
+
+;; CRC 8, 16, 32, (64 for TARGET_64)
+(define_expand "crc<ANYI2:mode><ANYI:mode>4"
+	;; return value (calculated CRC)
+  [(set (match_operand:ANYI 0 "register_operand" "=r")
+		      ;; initial CRC
+	(unspec:ANYI [(match_operand:ANYI 1 "register_operand" "r")
+		      ;; data
+		      (match_operand:ANYI2 2 "register_operand" "r")
+		      ;; polynomial
+		      (match_operand:ANYI 3)]
+		      UNSPEC_CRC))]
I'm not real comfortable with directly supporting sub-word sizes for the
output operand.  The vast majority of insns on the risc-v port have
outputs that use either the X iterator (which maps to either SI or DI
mode for rv32 and rv64 respectively) or they use the GPR iterator.  I
don't think many us ANYI.

Which ultimately makes sense since operations actually write X mode
bits.

Presumably the goal here is to capture cases where the resultant CRC
value is a sub-word size.  




+""
+{
+  /* TODO: We only support cases where the data size is not greater
+     than the CRC size.  */
+  if (GET_MODE (operands[0]) >= GET_MODE (operands[2]))
+    {
+      /* If we have the ZBC extension (ie, clmul) and
+	 it is possible to store the quotient within a single variable
+	 (E.g.  CRC64's quotient may need 65 bits,
+	  we can't keep it in 64 bit variable.)
+	 then use clmul instruction to implement the CRC,
+	 otherwise generate table-based CRC.  */
+      if (TARGET_ZBC && ((TARGET_64BIT && GET_MODE (operands[0]) != DImode)
+			|| (!TARGET_64BIT && GET_MODE (operands[0]) < SImode)))
Formatting NIT.  We'd tend to prefer to write this as

if (TARGET_ZBC
    && ((TARGET_64BIT && GET_MODE (operands[0]) != DImode)
        || (!TARGET_64BIT && GET_MODE (operands[0]) < SImode)))

But I think the condition you're really looking for is
if (TARGET_ZBC && GET_MODE (operands[0]) == word_mode)

Meaning the output is 32bits for rv32 or 64bits for rv64.  If we fix the
mode in the expander pattern to use X, then this condition would
simplify to just (TARGET_ZBC) since the output would be known to be word
sized already.






+	expand_crc_using_clmul (operands);
+      else
+	expand_crc_table_based (operands);
+    }
+  DONE;
+})
\ No newline at end of file
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index c9520f689e2..35bf19806df 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -131,6 +131,8 @@ extern bool riscv_shamt_matches_mask_p (int, HOST_WIDE_INT);
 extern void riscv_subword_address (rtx, rtx *, rtx *, rtx *, rtx *);
 extern void riscv_lshift_subword (machine_mode, rtx, rtx, rtx *);
 extern enum memmodel riscv_union_memmodels (enum memmodel, enum memmodel);
+extern void expand_crc_table_based (rtx *);
+extern void expand_crc_using_clmul (rtx *);
I wonder if all the table based generation support code should move into a
generic file so that it could be used by other targets.    I don't offhand
see anything in there that is RISC-V specific.



 
 /* Routines implemented in riscv-c.cc.  */
 void riscv_cpu_cpp_builtins (cpp_reader *);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 332fa720f01..e15850910a2 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
+/* Generates CRC lookup table by calculating CRC for all possible
+   8-bit data values.  The table is stored with a specific name in the read-only
+   data section.  */
+
+rtx
+generate_crc_table (unsigned HOST_WIDE_INT polynom, unsigned crc_bits)
+{
+  gcc_assert (crc_bits <= 64);
+  FILE *out = asm_out_file;
+
+  /* Buf size - 33 letters
+     + 18 for numbers (2 for crc bit size + 2 for 0x + 16 for 64 bit polynomial)
+     + 1 for \0.  */
+  char buf[33 + 20 + 1];
+  sprintf (buf, "crc_table_for_%u_bit_crc_%llx_polynomial", crc_bits, polynom);
+  tree id = maybe_get_identifier (buf);
+  if (id)
+    return gen_rtx_SYMBOL_REF (Pmode, IDENTIFIER_POINTER (id));
+  id = get_identifier (buf);
So I think we need to be more careful here with namespace pollution.
Ideally I think we'd want this symbol to be static and const. 




+  rtx tab = gen_rtx_SYMBOL_REF (Pmode, IDENTIFIER_POINTER (id));
+
+  /* Create a table with 256 elements.  */
+  unsigned table_elements_n = 0x100;
+  char val_align_op[7];
+  if (crc_bits <= 8)
+    sprintf (val_align_op, ".byte");
+  else if (crc_bits <= 16)
+    sprintf (val_align_op, ".half");
+  else if (crc_bits <= 32)
+    sprintf (val_align_op, ".word");
+  else
+    sprintf (val_align_op, ".dword");
There are standard target hooks for emitting data and a routine that
given the size the size of an object will return the right assembler
directive.  Look for integer_asm_op.

More generally, look in varasm.cc for various routines dealing with
generating assembler output.




+
+  /* Write in read only data section.  */
There are also standard ways to switch sections and standard section
names based on the target.  Again, varasm.cc will be the right place to
wander. 

Note that if you were to create a DECL node for the table and attach a
suitable DECL_INITIAL to it with its initialization data I think most,
if not all of the assembly output complexity would be handled for you
automatically.

If that is unwieldly to implement, then you'll definitely want to be
using various helpers from varasm.cc.

+  for (int i = 0; i < GET_MODE_SIZE (data_mode).to_constant (); i++)
+    {
+      /* crc >> (bit_size - 8).  */
+      rtx op1 = gen_rtx_ASHIFTRT (crc_mode, crc,
+				  GEN_INT (mode_bit_size - 8));
In general, I'd suggest gen_int_mode rather than GEN_INT.


+      /* crc_mode is the return value's mode,
+	depends on CRC function's CRC size.  */
+      if (crc_mode != word_mode)
+	ix = gen_rtx_ZERO_EXTEND (word_mode, ix);
+      ix = gen_rtx_ASHIFT (word_mode, ix, GEN_INT (exact_log2 (
+	  GET_MODE_SIZE (crc_mode).to_constant ())));
+      ix = force_reg (word_mode, ix);
Hmm, I'm not sure we can depend on the target actually supporting
ZERO_EXTEND.  Some targets use AND with suitable immediates.  Some might
use shifts, etc.


+
+      /* crc_table[(crc >> (bit_size - 8)) ^ data_8bit]  */
+      rtx tab_el = gen_rtx_MEM (crc_mode, gen_rtx_PLUS (word_mode, ix, tab));
Similarly, don't we have to worry about whether or not the resulting
address is actually valid?  They probably always are on RISC-V as I
don't think TAB will require more than 12 bits of offsets to address.
But some targets can only do simple register indirect addressing.


+
+      /* (crc << 8)  */
+      rtx high = gen_rtx_ASHIFT (crc_mode, crc, GEN_INT (8));
And here I don't think we can necessarily depend on the target
supporting shifts by a constant amount.  There's certainly targets out
there that can only shift one or two positions at a time.  Those targets
typically lie and claim they have more general shifting capabilities,
but the point is we need to be much more careful about the RTL we
generate.


+      if (crc_mode != word_mode)
+	 high = force_reg (crc_mode,
+			   gen_rtx_AND (crc_mode, high,
+					GEN_INT (GET_MODE_MASK (crc_mode))));
Similarly here, some targets may not support AND with immediate or may
not have enough bits to handle all the immediates that can arise here.

+      /* crc = (crc << 8) ^ crc_table[(crc >> (bit_size - 8)) ^ data_8bit];  */
+      crc = force_reg (crc_mode, gen_rtx_XOR (crc_mode, tab_el, high));
+    }
+  riscv_emit_move (operands[0], crc);
CRC is a REG.  Isn't operands[0] also a REG?  Can we just use
emit_move_insn here rather than riscv_emit_move?  Conceptually this code
should e target independent, so embedding risc-v isms in here will make
it harder to use on other targets.

@@ -3996,6 +4028,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 #define direct_cond_len_unary_optab_supported_p direct_optab_supported_p
 #define direct_cond_len_binary_optab_supported_p direct_optab_supported_p
 #define direct_cond_len_ternary_optab_supported_p direct_optab_supported_p
+#define direct_crc_optab_supported_p convert_optab_supported_p
So did we ever figure out why we're treating the CRC optab as a
conversion optab?

Re: RISC-V: Added support for CRC.

Reply via email to