https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97387
--- Comment #12 from fdlbxtqi <euloanty at live dot com> --- (In reply to CVS Commits from comment #11) > The master branch has been updated by Jakub Jelinek <ja...@gcc.gnu.org>: > > https://gcc.gnu.org/g:06bec55e80d98419121f3998d98d969990a75b0b > > commit r11-3882-g06bec55e80d98419121f3998d98d969990a75b0b > Author: Jakub Jelinek <ja...@redhat.com> > Date: Wed Oct 14 17:14:47 2020 +0200 > > i386: Improve chaining of _{addcarry,subborrow}_u{32,64} [PR97387] > > These builtins have two known issues and this patch fixes one of them. > > One issue is that the builtins effectively return two results and > they make the destination addressable until expansion, which means > a stack slot is allocated for them and e.g. with -fstack-protector* > DSE isn't able to optimize that away. I think for that we want to use > the technique of returning complex value; the patch doesn't handle that > though. See PR93990 for that. > > The other problem is optimization of successive uses of the builtin > e.g. for arbitrary precision arithmetic additions/subtractions. > As shown PR93990, combine is able to optimize the case when the first > argument to these builtins is 0 (the first instance when several are used > together), and also the last one if the last one ignores its result (i.e. > the carry/borrow is dead and thrown away in that case). > As shown in this PR, combiner refuses to optimize the rest, where it > sees: > (insn 10 9 11 2 (set (reg:QI 88 [ _31 ]) > (ltu:QI (reg:CCC 17 flags) > (const_int 0 [0]))) "include/adxintrin.h":69:10 785 > {*setcc_qi} > (expr_list:REG_DEAD (reg:CCC 17 flags) > (nil))) > - set pseudo 88 to CF from flags, then some uninteresting insns that > don't modify flags, and finally: > (insn 17 15 18 2 (parallel [ > (set (reg:CCC 17 flags) > (compare:CCC (plus:QI (reg:QI 88 [ _31 ]) > (const_int -1 [0xffffffffffffffff])) > (reg:QI 88 [ _31 ]))) > (clobber (scratch:QI)) > ]) "include/adxintrin.h":69:10 350 {*addqi3_cconly_overflow_1} > (expr_list:REG_DEAD (reg:QI 88 [ _31 ]) > (nil))) > to set CF in flags back to what we saved earlier. The combiner just > punts > trying to combine the 10, 17 and following addcarrydi (etc.) instruction, > because > if (i1 && !can_combine_p (i1, i3, i0, NULL, i2, NULL, &i1dest, &i1src)) > { > if (dump_file && (dump_flags & TDF_DETAILS)) > fprintf (dump_file, "Can't combine i1 into i3\n"); > undo_all (); > return 0; > } > fails - the 3 insns aren't all adjacent and > || (! all_adjacent > && (((!MEM_P (src) > || ! find_reg_note (insn, REG_EQUIV, src)) > && modified_between_p (src, insn, i3)) > src (flags hard register) is modified between the first and third insn - > in > the second insn. > > The following patch optimizes this by optimizing just the two insns, > 10 and 17 above, i.e. save CF into pseudo, set CF from that pseudo, into > a nop. The new define_insn_and_split matches how combine simplifies > those > two together (except without the ix86_cc_mode change it was choosing > CCmode > for the destination instead of CCCmode, so had to change that function > too, > and also adjust costs so that combiner understand it is beneficial). > > With this, all the testcases are optimized, so that the: > setc %dl > ... > addb $-1, %dl > insns in between the ad[dc][lq] or s[ub]b[lq] instructions are all > optimized > away (sure, if something would clobber flags in between they wouldn't, > but > there is nothing that can be done about that). > > 2020-10-14 Jakub Jelinek <ja...@redhat.com> > > PR target/97387 > * config/i386/i386.md (CC_CCC): New mode iterator. > (*setcc_qi_addqi3_cconly_overflow_1_<mode>): New > define_insn_and_split. > * config/i386/i386.c (ix86_cc_mode): Return CCCmode > for *setcc_qi_addqi3_cconly_overflow_1_<mode> pattern operands. > (ix86_rtx_costs): Return true and *total = 0; > for *setcc_qi_addqi3_cconly_overflow_1_<mode> pattern. Use op0 > and > op1 temporaries to simplify COMPARE checks. > > * gcc.target/i386/pr97387-1.c: New test. > * gcc.target/i386/pr97387-2.c: New test. awesome