On Sat, May 25, 2024 at 9:40 PM Jeff Law <jeffreya...@gmail.com> wrote:
> > > On 5/24/24 2:41 AM, Mariam Arutunian wrote: > > Add two new internal functions (IFN_CRC, IFN_CRC_REV), to provide faster > > CRC generation. > > One performs bit-forward and the other bit-reversed CRC computation. > > If CRC optabs are supported, they are used for the CRC computation. > > Otherwise, table-based CRC is generated. > > The supported data and CRC sizes are 8, 16, 32, and 64 bits. > > The polynomial is without the leading 1. > > A table with 256 elements is used to store precomputed CRCs. > > For the reflection of inputs and the output, a simple algorithm involving > > SHIFT, AND, and OR operations is used. > > > > Co-authored-by: Joern Rennecke <joern.renne...@embecosm.com > > <mailto:joern.renne...@embecosm.com>> > > > > gcc/ > > > > * doc/md.texi (crc@var{m}@var{n}4, > > crc_rev@var{m}@var{n}4): Document. > > * expr.cc (generate_crc_table): New function. > > (calculate_table_based_CRC): Likewise. > > (expand_crc_table_based): Likewise. > > (gen_common_operation_to_reflect): Likewise. > > (reflect_64_bit_value): Likewise. > > (reflect_32_bit_value): Likewise. > > (reflect_16_bit_value): Likewise. > > (reflect_8_bit_value): Likewise. > > (generate_reflecting_code_standard): Likewise. > > (expand_reversed_crc_table_based): Likewise. > > * expr.h (generate_reflecting_code_standard): New function > declaration. > > (expand_crc_table_based): Likewise. > > (expand_reversed_crc_table_based): Likewise. > > * internal-fn.cc: (crc_direct): Define. > > (direct_crc_optab_supported_p): Likewise. > > (expand_crc_optab_fn): New function > > * internal-fn.def (CRC, CRC_REV): New internal functions. > > * optabs.def (crc_optab, crc_rev_optab): New optabs. > > > > Signed-off-by: Mariam Arutunian <mariamarutun...@gmail.com > > <mailto:mariamarutun...@gmail.com>> > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > > index 5730bda80dc..be68ef860f9 100644 > > --- a/gcc/doc/md.texi > > +++ b/gcc/doc/md.texi > > @@ -8557,6 +8557,20 @@ operand 2, greater than operand 2 or is unordered > with operand 2. > > > > This pattern is not allowed to @code{FAIL}. > > > > +@cindex @code{crc@var{m}@var{n}4} instruction pattern > > +@item @samp{crc@var{m}@var{n}4} > > +Calculate a bit-forward CRC using operands 1, 2 and 3, > > +then store the result in operand 0. > > +Operands 1 is the initial CRC, operands 2 is the data and operands 3 is > the > > +polynomial without leading 1. > > +Operands 0, 1 and 3 have mode @var{n} and operand 2 has mode @var{m}, > where > > +both modes are integers. The size of CRC to be calculated is > determined by the > > +mode; for example, if @var{n} is 'hi', a CRC16 is calculated. > > + > > +@cindex @code{crc_rev@var{m}@var{n}4} instruction pattern > > +@item @samp{crc_rev@var{m}@var{n}4} > > +Similar to @samp{crc@var{m}@var{n}4}, but calculates a bit-reversed > CRC. > > + > So just to be clear, this is a case where the input (operand 2) may have > a different mode than the output (operand 0). That scenario is > generally discouraged, with a few exceptions (the most common being > shift counts which are often QImode objects while the > value-to-be-shifted and the output value are potentially any scalar > integer mode. > > So I don't think this is a problem, just wanted to point it out to > anyone else that may be looking at this code. > > > > @end table > > > > @end ifset > > diff --git a/gcc/expr.cc b/gcc/expr.cc > > index 1baa39b98eb..18368ae6b6c 100644 > > --- a/gcc/expr.cc > > +++ b/gcc/expr.cc > > @@ -14091,3 +14091,359 @@ int_expr_size (const_tree exp) > > > > return tree_to_shwi (size); > > } > > + > > +/* Calculate CRC for the initial CRC and given POLYNOMIAL. > > + CRC_BITS is CRC size. */ > > + > > +static unsigned HOST_WIDE_INT > > +calculate_crc (unsigned HOST_WIDE_INT crc, > > + unsigned HOST_WIDE_INT polynomial, > > + unsigned crc_bits) > Just a nit. Line up the polynomial & crc_bits declarations with the crc > declaration. > > I carefully reviewed the indentation of the code using different editors and viewers, and everything appeared correct. I double-checked the specific sections mentioned, and they also looked right. In this reply message I see that it's not correct. I'll try to fix it. > > > +{ > > + crc = crc << (crc_bits - 8); > > + for (int i = 8; i > 0; --i) > > + { > > + if ((crc >> (crc_bits - 1)) & 1) > > + crc = (crc << 1) ^ polynomial; > > + else > > + crc <<= 1; > > + } > > + > > + crc <<= (sizeof (crc) * BITS_PER_UNIT - crc_bits); > > + crc >>= (sizeof (crc) * BITS_PER_UNIT - crc_bits); > Another nit. Just once space after the <<= or >>= operators. > > Yes, this one hadn't noticed. Thanks. > > + > > + return crc; > > +} > > + > > +/* Assemble CRC table with 256 elements for the given POLYNOM and > CRC_BITS with > > + given ID. > > + ID is the identifier of the table, the name of the table is unique, > > + contains CRC size and the polynomial. > > + POLYNOM is the polynomial used to calculate the CRC table's elements. > > + CRC_BITS is the size of CRC, may be 8, 16, ... . */ > > + > > +rtx > > +assemble_crc_table (tree id, unsigned HOST_WIDE_INT polynom, unsigned > crc_bits) > > +{ > > + unsigned table_el_n = 0x100; > > + tree ar = build_array_type (make_unsigned_type (crc_bits), > > + build_index_type (size_int (table_el_n - > 1))); > Nit. Line up build_index_type at the same indention as make_unsigned_type. > > Note that with TREE_READONLY set, there is at least some chance that the > linker will find identical tables and merge them. I haven't tested > this, but I know it happens for other objects in the constant pools. > > Considering that identical tables would likely have the same name, their merging shouldn't pose a problem in principle. However, it's worth noting the potential issue if someone intentionally creates different tables with the same name. > > > + sprintf (buf, "crc_table_for_crc_%u_polynomial_" > HOST_WIDE_INT_PRINT_HEX, > > + crc_bits, polynom); > Another formatting nit. Line up the arguments when you need to wrap a > function call. ie > > foo (arg1, arg2.... > arg3, arg4....) > > > + > > +/* Generate table-based CRC code for the given CRC, INPUT_DATA and the > > + POLYNOMIAL (without leading 1). > > + > > + First, using POLYNOMIAL's value generates CRC table of 256 elements, > > + then generates the assembly for the following code, > > + where crc_size and data_size may be 8, 16, 32, 64, depending on CRC: > > + > > + for (int i = 0; i < data_size / 8; i++) > > + crc = (crc << 8) ^ crc_table[(crc >> (crc_size - 8)) > > + ^ (data >> (data_size - (i + 1) * 8) & > 0xFF))]; > > + > > + So to take values from the table, we need 8-bit data. > > + If input data size is not 8, then first we extract upper 8 bits, > > + then the other 8 bits, and so on. */ > > + > > +void > > +calculate_table_based_CRC (rtx *crc, const rtx &input_data, > > + const rtx &polynomial, > > + machine_mode crc_mode, machine_mode data_mode) > Same nit as before. Line up the parameters. It looks like we need to > check for that throughout this function when you wrapped function > argument lists. > > > > + > > +/* Generate table-based CRC code for the given CRC, INPUT_DATA and the > > + POLYNOMIAL (without leading 1). > > + > > + CRC is OP1, data is OP2 and the polynomial is OP3. > > + This must generate a CRC table and an assembly for the following > code, > > + where crc_size and data_size may be 8, 16, 32, 64: > > + uint_crc_size_t > > + crc_crc_size (uint_crc_size_t crc_init, uint_data_size_t data, > size_t size) > > + { > > + uint_crc_size_t crc = crc_init; > > + for (int i = 0; i < data_size / 8; i++) > > + crc = (crc << 8) > > + ^ crc_table[(crc >> (crc_size - 8)) > > + ^ (data >> (data_size - (i + 1) * 8) & 0xFF))]; > > + return crc; > > + } */ > > + > > +void > > +expand_crc_table_based (rtx op0, rtx op1, rtx op2, rtx op3, > > + machine_mode data_mode) > > +{ > > + gcc_assert (!CONST_INT_P (op0)); > > + gcc_assert (CONST_INT_P (op3)); > > + machine_mode crc_mode = GET_MODE (op0); > > + rtx crc = gen_reg_rtx (word_mode); > > + convert_move (crc, op1, 0); > > + calculate_table_based_CRC (&crc, op2, op3, crc_mode, data_mode); > > + if (crc_mode == SImode && word_mode == DImode) > > + { > > + rtx a_low = gen_lowpart_SUBREG (crc_mode, crc); > > + crc = gen_rtx_SIGN_EXTEND (word_mode, a_low); > > + } > > + rtx tgt = simplify_gen_subreg (word_mode, op0, crc_mode, 0); > The explicit checks for SI/DI mode don't look right here. What I think > you're looking for is crc_mode < word_mode. If you wanted to be even > omre precise you could check the sizes of the two modes. > Thanks. I'll check the sizes then, as I need to check that crc_mode size is 32 and word_mode is 64, only this one case. Along the same lines, I don't think you want/need the call to > simplify_gen_subreg when word_mode == crc_mode. ISTM that you only want > the subreg when crc_mode < word_mode. > > Presumably we rejected the optimization earlier if crc_mode > word_mode? > > Yes. > > > > > + emit_move_insn (tgt, crc); > > +} > > + > > +/* Generate the common operation for reflecting values: > > + *OP = (*OP & AND1_VALUE) << SHIFT_VAL | (*OP & AND2_VALUE) >> > SHIFT_VAL; */ > > + > > +void > > +gen_common_operation_to_reflect (rtx *op, > > + unsigned HOST_WIDE_INT and1_value, > > + unsigned HOST_WIDE_INT and2_value, > > + unsigned shift_val) > > +{ > > + rtx op1 = expand_and (word_mode, *op, gen_int_mode (and1_value, > word_mode), > > + NULL_RTX); > > + op1 = expand_shift (LSHIFT_EXPR, word_mode, op1, shift_val, op1, 0); > > + rtx op2 = expand_and (word_mode, *op, gen_int_mode (and2_value, > word_mode), > > + NULL_RTX); > > + op2 = expand_shift (RSHIFT_EXPR, word_mode, op2, shift_val, op2, 1); > > + *op = expand_binop (word_mode, ior_optab, op1, op2, *op, 0, > OPTAB_DIRECT); > > +} > Same nits as elsewhere. Line up the parameters & arguments when you > need to wrap lines. > > > > > + > > +/* Reflect 64-bit value for the 64-bit target. */ > > + > > +void > > +reflect_64_bit_value (rtx *op) > > +{ > > + gen_common_operation_to_reflect (op, 0x00000000FFFFFFFF, > 0xFFFFFFFF00000000, > > + 32); > > + gen_common_operation_to_reflect (op, 0x0000FFFF0000FFFF, > 0xFFFF0000FFFF0000, > > + 16); > > + gen_common_operation_to_reflect (op, 0x00FF00FF00FF00FF, > 0xFF00FF00FF00FF00, > > + 8); > > + gen_common_operation_to_reflect (op, 0x0F0F0F0F0F0F0F0F, > 0xF0F0F0F0F0F0F0F0, > > + 4); > > + gen_common_operation_to_reflect (op, 0x3333333333333333, > 0xCCCCCCCCCCCCCCCC, > > + 2); > > + gen_common_operation_to_reflect (op, 0x5555555555555555, > 0xAAAAAAAAAAAAAAAA, > > + 1); > Another class of nits. In general, if you're wrapping a long line like > these above, consider "balancing" the length of the lines a bit better. > so instead of bringing down just the final constant, bring down the > final two constants to a new line. I think that's what we generally do > elsehwere. > > Ok. > > > + > > + gen_reflecting_code (&crc, GET_MODE_BITSIZE (word_mode) - crc_size); > > + if (crc_mode == SImode && word_mode == DImode) > > + { > > + rtx a_low = gen_lowpart_SUBREG (crc_mode, crc); > > + crc = gen_rtx_SIGN_EXTEND (word_mode, a_low); > > + } > > + rtx tgt = simplify_gen_subreg (word_mode, op0, crc_mode, 0); > > + emit_move_insn (tgt, crc); > Same concerns as in the bit-forward implementation WRT testing SI/DI > explicitly. Testing crc_mode < word_mode or testing their bitsizes > seems safer. > > > +/* Expand CRC call STMT. */ > > + > > +static void > > +expand_crc_optab_fn (internal_fn fn, gcall *stmt, convert_optab optab) > > +{ > > + tree lhs = gimple_call_lhs (stmt); > > + tree rhs1 = gimple_call_arg (stmt, 0); // crc > > + tree rhs2 = gimple_call_arg (stmt, 1); // data > > + tree rhs3 = gimple_call_arg (stmt, 2); // polynomial > > + > > + tree result_type = TREE_TYPE (lhs); > > + tree data_type = TREE_TYPE (rhs2); > > + > > + gcc_assert (TYPE_MODE (result_type) >= TYPE_MODE (data_type)); > > + gcc_assert (word_mode >= TYPE_MODE (result_type)); > > + > > + rtx dest = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); > > + rtx crc = expand_normal (rhs1); > > + rtx data = expand_normal (rhs2); > > + gcc_assert (TREE_CODE (rhs3) == INTEGER_CST); > > + rtx polynomial = gen_rtx_CONST_INT (TYPE_MODE (result_type), > > + TREE_INT_CST_LOW (rhs3)); > Nit. Looks like this code got over-indented. Just two spaces of > indention after the open-curly. > > > In general, most of the concerns with this patch are formatting nits. > The functional concerns are with those mode checks as mentioned in a > couple places above. Repost once you've got an update fixing these issues. > Thanks, > Jeff I guess the indentation problem is because of the tab size. I'll fix all the suggestions. Thank you very much. Mariam > >