Update the patch so it can apply.

Tested on spec2017 fprate cases again. With option "-funroll-loops -Ofast 
-flto",
the improvements of 1-copy run are:

Ampere1:
    508.namd_r  4.26% 
    510.parest_r        2.55%
    Overall             0.54%
Intel Xeon:
    503.bwaves_r        1.3%    
    508.namd_r  1.58%
    overall             0.42%


Thanks,
Di Zhao


> -----Original Message-----
> From: Di Zhao OS
> Sent: Friday, June 16, 2023 4:51 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH] tree-optimization/110279- Check for nested FMA chains in
> reassoc
> 
> This patch is to fix the regressions found in SPEC2017 fprate cases
>  on aarch64.
> 
> 1. Reused code in pass widening_mul to check for nested FMA chains
>  (those connected by MULT_EXPRs), since re-writing to parallel
>  generates worse codes.
> 
> 2. Avoid re-arrange to produce less FMA chains that can be slow.
> 
> Tested on ampere1 and neoverse-n1, this fixed the regressions in
> 508.namd_r and 510.parest_r 1 copy run. While I'm still collecting data
> on x86 machines we have, I'd like to know what do you think of this.
> 
> (Previously I tried to improve things with FMA by adding a widening_mul
> pass before reassoc2 for it's easier to recognize different patterns
> of FMA chains and decide whether to split them. But I suppose handling
> them all in reassoc pass is more efficient.)
> 
> Thanks,
> Di Zhao
> 
> ---
> gcc/ChangeLog:
> 
>         * tree-ssa-math-opts.cc (convert_mult_to_fma_1): Add new parameter.
>         Support new mode that merely do the checking.
>         (struct fma_transformation_info): Moved to header.
>         (class fma_deferring_state): Moved to header.
>         (convert_mult_to_fma): Add new parameter.
>         * tree-ssa-math-opts.h (struct fma_transformation_info):
>         (class fma_deferring_state): Moved from .cc.
>         (convert_mult_to_fma): Add function decl.
>         * tree-ssa-reassoc.cc (rewrite_expr_tree_parallel):
>         (rank_ops_for_fma): Return -1 if nested FMAs are found.
>         (reassociate_bb): Avoid rewriting to parallel if nested FMAs are
> found.

Attachment: 0001-Check-for-nested-FMA-chains-in-reassoc.patch
Description: 0001-Check-for-nested-FMA-chains-in-reassoc.patch

Reply via email to