12] Replace the original CRC loops with a faster CRC calculation.

Richard Biener Tue, 09 Jul 2024 23:34:40 -0700

On Wed, Jul 10, 2024 at 3:17 AM Jeff Law <jeffreya...@gmail.com> wrote:
>
>
>
> On 5/24/24 2:42 AM, Mariam Arutunian wrote:
> >   Specifically, use the following alternatives: If the target
> >   supports the crc32 instruction, use it directly. If the target
> > supports the
> >   carry-less multiplication instruction, use it to calculate the CRC. If the
> >   target does not support either of the above, use a table-based CRC
> >   calculation.
> >
> > During initial checks, the loop's output CRC (i.e., the variable where
> > the calculated CRC is stored after the loop execution) is determined.
> > The entire loop is removed and replaced with an internal function call
> > (CRC, CRC_REV),
> > and the result is assigned to the output CRC variable.
> >
> >    gcc/
> >
> >      * gimple-crc-optimization.cc (get_data): New function.
> >      (faster_crc_code_generation): Likewise.
> >      (build_polynomial_without_1): Likewise.
> >      (execute): Add faster_crc_code_generation function call.
> >      * tree-loop-distribution.cc (destroy_loop): Remove, move function
> > to tree-ssa-loop-manip.cc.
> >      * tree-ssa-loop-manip.cc (destroy_loop): Add, move function from
> > tree-loop-distribution.cc.
> >      * tree-ssa-loop-manip.h (destroy_loop): Add extern function
> > declaration.
> >
> > Signed-off-by: Mariam Arutunian <mariamarutun...@gmail.com
> > <mailto:mariamarutun...@gmail.com>>
> > diff --git a/gcc/gimple-crc-optimization.cc b/gcc/gimple-crc-optimization.cc
> > index 039506c1059..c23bbf9f44c 100644
> > --- a/gcc/gimple-crc-optimization.cc
> > +++ b/gcc/gimple-crc-optimization.cc
>
> > @@ -1065,6 +1083,178 @@ crc_optimization::get_output_phi ()
> >    return nullptr;
> >  }
> >
> > +/* Build tree for the POLYNOMIAL (from its binary representation)
> > +   without the leading 1.  */
> > +
> > +tree
> > +crc_optimization::build_polynomial_without_1 (tree crc_arg, value 
> > *polynomial)
> > +{
> > +  unsigned HOST_WIDE_INT cst_polynomial = 0;
> > +  for (size_t i = 0; i < (*polynomial).length (); i++)
> > +    {
> > +      value_bit *const_bit;
> > +      if (m_is_bit_forward)
> > +     const_bit = (*polynomial)[(*polynomial).length () - 1 - i];
> > +      else
> > +     const_bit = (*polynomial)[i];
> > +      cst_polynomial <<= 1;
> > +      cst_polynomial ^= (as_a<bit *> (const_bit))->get_val () ? 1 : 0;
> > +    }
> > +  return build_int_cstu (TREE_TYPE (crc_arg), cst_polynomial);
> So I'm curious why we're using size_t for the loop iteration variable.
> It's probably not a big deal, but it just didn't seem like we're
> iterating over something that you'd normally count with a size_t.
>
>
>
> > +
> > +  /* Create a new variable for the data.  */
> > +
> > +  if (dump_file && (dump_flags & TDF_DETAILS))
> > +    fprintf (dump_file,
> > +          "Data and CRC are xor-ed before for loop.  Initializing data "
> > +          "with 0.\n");
> > +  tree type = nullptr;
> > +  /* Determine the data's size with the loop iteration count.
> > +     We assume that loop iteration count depends on the data's size.  */
> > +  if (data_size == TYPE_PRECISION (intQI_type_node))
> > +    type = intQI_type_node;
> > +  else if (data_size == TYPE_PRECISION (intHI_type_node))
> > +    type = intHI_type_node;
> > +  else if (data_size == TYPE_PRECISION (intSI_type_node))
> > +    type = intSI_type_node;
> > +  else if (data_size == TYPE_PRECISION (intDI_type_node))
> > +    type = intDI_type_node;
> > +  else if (data_size == TYPE_PRECISION (intTI_type_node))
> > +    type = intTI_type_node;
> > +  else
> > +    {
> > +      if (dump_file && (dump_flags & TDF_DETAILS))
> > +     fprintf (dump_file, "Couldn't determine the data's size.\n");
> > +      return nullptr;
> > +    }
> > +  return build_int_cstu (type, 0);
> Hmm, I think there's a routine that should return you a suitable type
> based on a precision.   build_nonstandard_integer_type would probably work.
>
> > +
> > +}
> > +
> > +/* Replaces CRC calculation loop with CRC_IFN call.
> > +   Returns true if replacement is succeeded, otherwise false.
> > +
> > +   First the function determines CRC, data and the polynomial
> > +   and depending on the CRC type, instead of the loop generates:
> > +   output_crc = IFN_CRC(_REV) (CRC, data, polynomial);  */
> > +
> > +bool
> > +crc_optimization::faster_crc_code_generation (function *fun,
> > +                                           value *polynomial,
> > +                                           gphi *output_crc)
> > +{
>
> > +
> > +  tree crc_arg = PHI_ARG_DEF (m_phi_for_crc, 1);
> > +  if (TYPE_MODE (TREE_TYPE (crc_arg)) > word_mode)
> So we probably want to tighten this (sorry I should have caught this
> earlier).  Perhaps:
>
> GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (crc_arg))) > GET_MODE_SIZE (word_mode)
>
> or something like that.
>
>
>
> >  unsigned int
> >  crc_optimization::execute (function *fun)
> >  {
> > @@ -1110,6 +1300,12 @@ crc_optimization::execute (function *fun)
> >         if (dump_file)
> >           fprintf (dump_file, "The loop with %d header BB index "
> >                               "calculates CRC!\n", 
> > m_crc_loop->header->index);
> > +
> > +       if (!faster_crc_code_generation (fun, polynom_value, output_crc))
> So it seems trivial, but I'm thinking that we probably need a better
> name for this function.  Perhaps "optimize_crc_loop"?
>
> I thought Richi may have suggested a change so that we didn't need to do
> quite as much manual removal of the loop and its associated
> datastructures.  Unfortunately I don't remember what that suggestion was
> -- but it would potentially simplify this patch though.


It sounds like my usual comment to leave work to CFG cleanup and DCE.
"Removing" a loop can be done by altering the exit condition to always exit
which makes CFG cleanup remove the loop structure and the backedge and
a followup DCE removes dead stmts.

Richard.

> Jeff
>
> ps.  I got your message and realize you're working on other projects
> now.  So no rush.
>

Re: [RFC/RFA] [PATCH 11/12] Replace the original CRC loops with a faster CRC calculation.

Reply via email to