On 5/24/24 2:42 AM, Mariam Arutunian wrote:
Specifically, use the following alternatives: If the target
supports the crc32 instruction, use it directly. If the target
supports the
carry-less multiplication instruction, use it to calculate the CRC. If the
target does not support either of the above, use a table-based CRC
calculation.
During initial checks, the loop's output CRC (i.e., the variable where
the calculated CRC is stored after the loop execution) is determined.
The entire loop is removed and replaced with an internal function call
(CRC, CRC_REV),
and the result is assigned to the output CRC variable.
gcc/
* gimple-crc-optimization.cc (get_data): New function.
(faster_crc_code_generation): Likewise.
(build_polynomial_without_1): Likewise.
(execute): Add faster_crc_code_generation function call.
* tree-loop-distribution.cc (destroy_loop): Remove, move function
to tree-ssa-loop-manip.cc.
* tree-ssa-loop-manip.cc (destroy_loop): Add, move function from
tree-loop-distribution.cc.
* tree-ssa-loop-manip.h (destroy_loop): Add extern function
declaration.
Signed-off-by: Mariam Arutunian <mariamarutun...@gmail.com
<mailto:mariamarutun...@gmail.com>>
diff --git a/gcc/gimple-crc-optimization.cc b/gcc/gimple-crc-optimization.cc
index 039506c1059..c23bbf9f44c 100644
--- a/gcc/gimple-crc-optimization.cc
+++ b/gcc/gimple-crc-optimization.cc
@@ -1065,6 +1083,178 @@ crc_optimization::get_output_phi ()
return nullptr;
}
+/* Build tree for the POLYNOMIAL (from its binary representation)
+ without the leading 1. */
+
+tree
+crc_optimization::build_polynomial_without_1 (tree crc_arg, value *polynomial)
+{
+ unsigned HOST_WIDE_INT cst_polynomial = 0;
+ for (size_t i = 0; i < (*polynomial).length (); i++)
+ {
+ value_bit *const_bit;
+ if (m_is_bit_forward)
+ const_bit = (*polynomial)[(*polynomial).length () - 1 - i];
+ else
+ const_bit = (*polynomial)[i];
+ cst_polynomial <<= 1;
+ cst_polynomial ^= (as_a<bit *> (const_bit))->get_val () ? 1 : 0;
+ }
+ return build_int_cstu (TREE_TYPE (crc_arg), cst_polynomial);
So I'm curious why we're using size_t for the loop iteration variable.
It's probably not a big deal, but it just didn't seem like we're
iterating over something that you'd normally count with a size_t.
+
+ /* Create a new variable for the data. */
+
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file,
+ "Data and CRC are xor-ed before for loop. Initializing data "
+ "with 0.\n");
+ tree type = nullptr;
+ /* Determine the data's size with the loop iteration count.
+ We assume that loop iteration count depends on the data's size. */
+ if (data_size == TYPE_PRECISION (intQI_type_node))
+ type = intQI_type_node;
+ else if (data_size == TYPE_PRECISION (intHI_type_node))
+ type = intHI_type_node;
+ else if (data_size == TYPE_PRECISION (intSI_type_node))
+ type = intSI_type_node;
+ else if (data_size == TYPE_PRECISION (intDI_type_node))
+ type = intDI_type_node;
+ else if (data_size == TYPE_PRECISION (intTI_type_node))
+ type = intTI_type_node;
+ else
+ {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file, "Couldn't determine the data's size.\n");
+ return nullptr;
+ }
+ return build_int_cstu (type, 0);
Hmm, I think there's a routine that should return you a suitable type
based on a precision. build_nonstandard_integer_type would probably work.
+
+}
+
+/* Replaces CRC calculation loop with CRC_IFN call.
+ Returns true if replacement is succeeded, otherwise false.
+
+ First the function determines CRC, data and the polynomial
+ and depending on the CRC type, instead of the loop generates:
+ output_crc = IFN_CRC(_REV) (CRC, data, polynomial); */
+
+bool
+crc_optimization::faster_crc_code_generation (function *fun,
+ value *polynomial,
+ gphi *output_crc)
+{
+
+ tree crc_arg = PHI_ARG_DEF (m_phi_for_crc, 1);
+ if (TYPE_MODE (TREE_TYPE (crc_arg)) > word_mode)
So we probably want to tighten this (sorry I should have caught this
earlier). Perhaps:
GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (crc_arg))) > GET_MODE_SIZE (word_mode)
or something like that.
unsigned int
crc_optimization::execute (function *fun)
{
@@ -1110,6 +1300,12 @@ crc_optimization::execute (function *fun)
if (dump_file)
fprintf (dump_file, "The loop with %d header BB index "
"calculates CRC!\n", m_crc_loop->header->index);
+
+ if (!faster_crc_code_generation (fun, polynom_value, output_crc))
So it seems trivial, but I'm thinking that we probably need a better
name for this function. Perhaps "optimize_crc_loop"?
I thought Richi may have suggested a change so that we didn't need to do
quite as much manual removal of the loop and its associated
datastructures. Unfortunately I don't remember what that suggestion was
-- but it would potentially simplify this patch though.
Jeff
ps. I got your message and realize you're working on other projects
now. So no rush.