On Wed, Jul 10, 2024 at 3:17 AM Jeff Law <jeffreya...@gmail.com> wrote: > > > > On 5/24/24 2:42 AM, Mariam Arutunian wrote: > > Specifically, use the following alternatives: If the target > > supports the crc32 instruction, use it directly. If the target > > supports the > > carry-less multiplication instruction, use it to calculate the CRC. If the > > target does not support either of the above, use a table-based CRC > > calculation. > > > > During initial checks, the loop's output CRC (i.e., the variable where > > the calculated CRC is stored after the loop execution) is determined. > > The entire loop is removed and replaced with an internal function call > > (CRC, CRC_REV), > > and the result is assigned to the output CRC variable. > > > > gcc/ > > > > * gimple-crc-optimization.cc (get_data): New function. > > (faster_crc_code_generation): Likewise. > > (build_polynomial_without_1): Likewise. > > (execute): Add faster_crc_code_generation function call. > > * tree-loop-distribution.cc (destroy_loop): Remove, move function > > to tree-ssa-loop-manip.cc. > > * tree-ssa-loop-manip.cc (destroy_loop): Add, move function from > > tree-loop-distribution.cc. > > * tree-ssa-loop-manip.h (destroy_loop): Add extern function > > declaration. > > > > Signed-off-by: Mariam Arutunian <mariamarutun...@gmail.com > > <mailto:mariamarutun...@gmail.com>> > > diff --git a/gcc/gimple-crc-optimization.cc b/gcc/gimple-crc-optimization.cc > > index 039506c1059..c23bbf9f44c 100644 > > --- a/gcc/gimple-crc-optimization.cc > > +++ b/gcc/gimple-crc-optimization.cc > > > @@ -1065,6 +1083,178 @@ crc_optimization::get_output_phi () > > return nullptr; > > } > > > > +/* Build tree for the POLYNOMIAL (from its binary representation) > > + without the leading 1. */ > > + > > +tree > > +crc_optimization::build_polynomial_without_1 (tree crc_arg, value > > *polynomial) > > +{ > > + unsigned HOST_WIDE_INT cst_polynomial = 0; > > + for (size_t i = 0; i < (*polynomial).length (); i++) > > + { > > + value_bit *const_bit; > > + if (m_is_bit_forward) > > + const_bit = (*polynomial)[(*polynomial).length () - 1 - i]; > > + else > > + const_bit = (*polynomial)[i]; > > + cst_polynomial <<= 1; > > + cst_polynomial ^= (as_a<bit *> (const_bit))->get_val () ? 1 : 0; > > + } > > + return build_int_cstu (TREE_TYPE (crc_arg), cst_polynomial); > So I'm curious why we're using size_t for the loop iteration variable. > It's probably not a big deal, but it just didn't seem like we're > iterating over something that you'd normally count with a size_t. > > > > > + > > + /* Create a new variable for the data. */ > > + > > + if (dump_file && (dump_flags & TDF_DETAILS)) > > + fprintf (dump_file, > > + "Data and CRC are xor-ed before for loop. Initializing data " > > + "with 0.\n"); > > + tree type = nullptr; > > + /* Determine the data's size with the loop iteration count. > > + We assume that loop iteration count depends on the data's size. */ > > + if (data_size == TYPE_PRECISION (intQI_type_node)) > > + type = intQI_type_node; > > + else if (data_size == TYPE_PRECISION (intHI_type_node)) > > + type = intHI_type_node; > > + else if (data_size == TYPE_PRECISION (intSI_type_node)) > > + type = intSI_type_node; > > + else if (data_size == TYPE_PRECISION (intDI_type_node)) > > + type = intDI_type_node; > > + else if (data_size == TYPE_PRECISION (intTI_type_node)) > > + type = intTI_type_node; > > + else > > + { > > + if (dump_file && (dump_flags & TDF_DETAILS)) > > + fprintf (dump_file, "Couldn't determine the data's size.\n"); > > + return nullptr; > > + } > > + return build_int_cstu (type, 0); > Hmm, I think there's a routine that should return you a suitable type > based on a precision. build_nonstandard_integer_type would probably work. > > > + > > +} > > + > > +/* Replaces CRC calculation loop with CRC_IFN call. > > + Returns true if replacement is succeeded, otherwise false. > > + > > + First the function determines CRC, data and the polynomial > > + and depending on the CRC type, instead of the loop generates: > > + output_crc = IFN_CRC(_REV) (CRC, data, polynomial); */ > > + > > +bool > > +crc_optimization::faster_crc_code_generation (function *fun, > > + value *polynomial, > > + gphi *output_crc) > > +{ > > > + > > + tree crc_arg = PHI_ARG_DEF (m_phi_for_crc, 1); > > + if (TYPE_MODE (TREE_TYPE (crc_arg)) > word_mode) > So we probably want to tighten this (sorry I should have caught this > earlier). Perhaps: > > GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (crc_arg))) > GET_MODE_SIZE (word_mode) > > or something like that. > > > > > unsigned int > > crc_optimization::execute (function *fun) > > { > > @@ -1110,6 +1300,12 @@ crc_optimization::execute (function *fun) > > if (dump_file) > > fprintf (dump_file, "The loop with %d header BB index " > > "calculates CRC!\n", > > m_crc_loop->header->index); > > + > > + if (!faster_crc_code_generation (fun, polynom_value, output_crc)) > So it seems trivial, but I'm thinking that we probably need a better > name for this function. Perhaps "optimize_crc_loop"? > > I thought Richi may have suggested a change so that we didn't need to do > quite as much manual removal of the loop and its associated > datastructures. Unfortunately I don't remember what that suggestion was > -- but it would potentially simplify this patch though.
It sounds like my usual comment to leave work to CFG cleanup and DCE. "Removing" a loop can be done by altering the exit condition to always exit which makes CFG cleanup remove the loop structure and the backedge and a followup DCE removes dead stmts. Richard. > Jeff > > ps. I got your message and realize you're working on other projects > now. So no rush. >