https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97387

--- Comment #12 from fdlbxtqi <euloanty at live dot com> ---
(In reply to CVS Commits from comment #11)
> The master branch has been updated by Jakub Jelinek <ja...@gcc.gnu.org>:
> 
> https://gcc.gnu.org/g:06bec55e80d98419121f3998d98d969990a75b0b
> 
> commit r11-3882-g06bec55e80d98419121f3998d98d969990a75b0b
> Author: Jakub Jelinek <ja...@redhat.com>
> Date:   Wed Oct 14 17:14:47 2020 +0200
> 
>     i386: Improve chaining of _{addcarry,subborrow}_u{32,64} [PR97387]
>     
>     These builtins have two known issues and this patch fixes one of them.
>     
>     One issue is that the builtins effectively return two results and
>     they make the destination addressable until expansion, which means
>     a stack slot is allocated for them and e.g. with -fstack-protector*
>     DSE isn't able to optimize that away.  I think for that we want to use
>     the technique of returning complex value; the patch doesn't handle that
>     though.  See PR93990 for that.
>     
>     The other problem is optimization of successive uses of the builtin
>     e.g. for arbitrary precision arithmetic additions/subtractions.
>     As shown PR93990, combine is able to optimize the case when the first
>     argument to these builtins is 0 (the first instance when several are used
>     together), and also the last one if the last one ignores its result (i.e.
>     the carry/borrow is dead and thrown away in that case).
>     As shown in this PR, combiner refuses to optimize the rest, where it
> sees:
>     (insn 10 9 11 2 (set (reg:QI 88 [ _31 ])
>             (ltu:QI (reg:CCC 17 flags)
>                 (const_int 0 [0]))) "include/adxintrin.h":69:10 785
> {*setcc_qi}
>          (expr_list:REG_DEAD (reg:CCC 17 flags)
>             (nil)))
>     - set pseudo 88 to CF from flags, then some uninteresting insns that
>     don't modify flags, and finally:
>     (insn 17 15 18 2 (parallel [
>                 (set (reg:CCC 17 flags)
>                     (compare:CCC (plus:QI (reg:QI 88 [ _31 ])
>                             (const_int -1 [0xffffffffffffffff]))
>                         (reg:QI 88 [ _31 ])))
>                 (clobber (scratch:QI))
>             ]) "include/adxintrin.h":69:10 350 {*addqi3_cconly_overflow_1}
>          (expr_list:REG_DEAD (reg:QI 88 [ _31 ])
>             (nil)))
>     to set CF in flags back to what we saved earlier.  The combiner just
> punts
>     trying to combine the 10, 17 and following addcarrydi (etc.) instruction,
>     because
>       if (i1 && !can_combine_p (i1, i3, i0, NULL, i2, NULL, &i1dest, &i1src))
>         {
>           if (dump_file && (dump_flags & TDF_DETAILS))
>             fprintf (dump_file, "Can't combine i1 into i3\n");
>           undo_all ();
>           return 0;
>         }
>     fails - the 3 insns aren't all adjacent and
>           || (! all_adjacent
>               && (((!MEM_P (src)
>                     || ! find_reg_note (insn, REG_EQUIV, src))
>                    && modified_between_p (src, insn, i3))
>     src (flags hard register) is modified between the first and third insn -
> in
>     the second insn.
>     
>     The following patch optimizes this by optimizing just the two insns,
>     10 and 17 above, i.e. save CF into pseudo, set CF from that pseudo, into
>     a nop.  The new define_insn_and_split matches how combine simplifies
> those
>     two together (except without the ix86_cc_mode change it was choosing
> CCmode
>     for the destination instead of CCCmode, so had to change that function
> too,
>     and also adjust costs so that combiner understand it is beneficial).
>     
>     With this, all the testcases are optimized, so that the:
>             setc    %dl
>     ...
>             addb    $-1, %dl
>     insns in between the ad[dc][lq] or s[ub]b[lq] instructions are all
> optimized
>     away (sure, if something would clobber flags in between they wouldn't,
> but
>     there is nothing that can be done about that).
>     
>     2020-10-14  Jakub Jelinek  <ja...@redhat.com>
>     
>             PR target/97387
>             * config/i386/i386.md (CC_CCC): New mode iterator.
>             (*setcc_qi_addqi3_cconly_overflow_1_<mode>): New
>             define_insn_and_split.
>             * config/i386/i386.c (ix86_cc_mode): Return CCCmode
>             for *setcc_qi_addqi3_cconly_overflow_1_<mode> pattern operands.
>             (ix86_rtx_costs): Return true and *total = 0;
>             for *setcc_qi_addqi3_cconly_overflow_1_<mode> pattern.  Use op0
> and
>             op1 temporaries to simplify COMPARE checks.
>     
>             * gcc.target/i386/pr97387-1.c: New test.
>             * gcc.target/i386/pr97387-2.c: New test.

awesome

Reply via email to