On Fri, Aug 2, 2024 at 2:12 PM Richard Biener <richard.guent...@gmail.com>
wrote:

> On Wed, Jul 31, 2024 at 10:15 AM Mariam Arutunian
> <mariamarutun...@gmail.com> wrote:
> >
> >  This patch adds a new compiler pass aimed at identifying naive CRC
> implementations,
> > characterized by the presence of a loop calculating a CRC (polynomial
> long division).
> >  Upon detection of a potential CRC, the pass prints an informational
> message.
> >
> >  Performs CRC optimization if optimization level is >= 2,
> > besides optimizations for size and if fno_gimple_crc_optimization given.
> >
> >  This pass is added for the detection and optimization of naive CRC
> implementations,
> > improving the efficiency of CRC-related computations.
> >
> >   This patch includes only initial fast checks for filtering out
> non-CRCs,
> > detected possible CRCs verification and optimization parts will be
> provided in subsequent patches.
>
> +fgimple-crc-optimization
>
> Please do not put IL names in flags exposed to users, this should be
> -foptimize-crc
>
>
Ok.


> +This flag is enabled by default at @option{-O2} and higher
> +if @option{-Os} is not also specified.
>
> Please leave out "if @option{-Os} is not also specified".  I wonder why
> for example if the CPU has a CRC instruction and the polynomial matches
> we would disable this when optimizing for size?  A CRC instruction should
> be smaller?
>

I wanted to disable it only in the case of table-based CRC.
However, I couldn't find a way to determine whether the CRC loop would be
replaced with a table-based implementation, a CRC instruction, or a
CLMUL-based implementation beforehand.


> +/* This pass performs CRC optimization.  */
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "tree.h"
> +#include "gimple.h"
> +#include "tree-pass.h"
> +#include "ssa.h"
> +#include "gimple-iterator.h"
> +#include "tree-cfg.h"
> +#include "tree-ssa-loop-niter.h"
>
> please try to prune the set of #include (I guess you've
> copied this from elsewhere).


I guess those unnecessary includes were left over from when I was trying to
remove the CRC loop.  I tried and used many codes from different files,
then changed that removal part, but forgot to remove includes.


>
+         /* Don't continue searching if encountered the loop's beginning.
> */
> +         if (bb_loop_header_p (gimple_bb (stmt)))
> +           return nullptr;
>
> gimple_bb (stmt) == m_crc_loop->header
>
> it looks like this loop ignores other uses of xored_crc, like in
> calls?  Shouldn't
> there be a
>
>     else if (!is_gimple_debug (stmt))
>        return nullptr;
>
> in the FOR_EACH_IMM_USE_FAST loop?
>

Yes, it ignores them.
For example there can be a print function that prints CRC's value on each
iteration, which is normal.
These initial checks weren't very strict because a validation step follows.
However, after considering your comment, I'm thinking of making the initial
checks more strict. For this same example, if there is a print  within the
loop, the current approach will not replace the CRC computation, so I can
filter out such cases during the initial checks.
If later we start to replace wider types of CRC implementations, we can
modify this part of the code accordingly.


> It looks like loop_may_calculate_crc walks all loops XOR stmts and
> for each xor_calculates_crc you call set_bbs_stmts_not_visited.
> This looks like it would be quadratic in complexity considering a loop
> with 1000s of XOR stmts _not_ computing a CRC?  This would be
> solved by aborting the search when the first found XOR is not
> a CRC for example.  Or by doing the initialization only once.
>
>
Aborting after the first check won't work, as in CRC loops may be more than
one xor operation. For example, there can be one for xoring CRC's and
data's leading bits, and another for xoring with the polynomial.
We can now abort after the second check, as we now support only the most
common CRC cases, where up to 2 xor operations may occur.
If we add support for more complex CRC loops, we can update this part as
well.

It also seems that in xor_calculates_crc finding the
> shift after the XOR is much cheaper than the one before
> as that collects dependent stmts (but we unset visited on all
> stmts _again_).  May I suggest to use a bitmap of SSA def
> SSA_NAME_VERSIONs instead of using the gimple visited flag?
>
>
I don't know how it works, but I'll figure it out and use it. Thanks.


> You are collecting a possibly very large use-def chain but
> only very simple checks are later done on it.
>
> Yes.
In my earlier versions I was going up by use-def chain and doing the checks
immediately. But it wasn't good and easy to understand when I was doing
many different things within one function, so  I changed the
implementation. Also I had written that way, as in the earlier stages of
the development I wanted to detect as many CRC cases as possible. But, as
now we support less cases, I can change this part too.


Thanks, for your comments. I'll change these checks to detect more specific
cases, thus reducing the work when the loop is not calculating CRC.
However, I'll ensure that all the cases we're currently detecting in the
GCC test suite will continue to be detected even after the changes.

BR,
Mariam.

Richard.
>
> >      gcc/
> >
> >        * Makefile.in (OBJS): Add gimple-crc-optimization.o.
> >        * common.opt (fgimple-crc-optimization): New option.
> >        * common.opt.urls: Regenerate to add
> >        fgimple-crc-optimization.
> >        * doc/invoke.texi (-fgimple-crc-optimization): Add documentation.
> >        * gimple-crc-optimization.cc: New file.
> >        * gimple.cc (set_phi_stmts_not_visited): New function.
> >        (set_gimple_stmts_not_visited): Likewise.
> >        (set_bbs_stmts_not_visited): Likewise.
> >        * gimple.h (set_gimple_stmts_not_visited): New extern function
> declaration.
> >        (set_phi_stmts_not_visited): New extern function declaration.
> >        (set_bbs_stmts_not_visited): New extern function declaration.
> >        * opts.cc (default_options_table): Add
> OPT_fgimple_crc_optimization.
> >        (enable_fdo_optimizations): Enable gimple-crc-optimization.
> >        * passes.def (pass_crc_optimization): Add new pass.
> >        * timevar.def (TV_GIMPLE_CRC_OPTIMIZATION): New timevar.
> >        * tree-pass.h (make_pass_crc_optimization): New extern function
> declaration.
> >
> > Signed-off-by: Mariam Arutunian <mariamarutun...@gmail.com>
>

Reply via email to