https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78200
Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jakub at gcc dot gnu.org --- Comment #22 from Jakub Jelinek <jakub at gcc dot gnu.org> --- If the problem is that (In reply to Richard Biener from comment #9) > So RTL expansion ends up in > > /* If jumps are cheap and the target does not support conditional > compare, turn some more codes into jumpy sequences. */ > else if (BRANCH_COST (optimize_insn_for_speed_p (), false) < 4 > && targetm.gen_ccmp_first == NULL) > { > if ((code2 == BIT_AND_EXPR > && TYPE_PRECISION (TREE_TYPE (op0)) == 1 > && TREE_CODE (gimple_assign_rhs2 (second)) != INTEGER_CST) > || code2 == TRUTH_AND_EXPR) > { > code = TRUTH_ANDIF_EXPR; > op0 = gimple_assign_rhs1 (second); > op1 = gimple_assign_rhs2 (second); > > where we could adjust operand order based on the immediately dominating > condition. > > Unfortunately sth as simple as > > /* We'll expand RTL for op0 first, see if we'd better > expand RTL for op1 first. */ > if (TREE_CODE (op1) == SSA_NAME > && single_pred_p (bb)) > { > gimple *def1 = SSA_NAME_DEF_STMT (op1); > if (is_gimple_assign (def1) > && TREE_CODE_CLASS (gimple_assign_rhs_code (def1)) > == > tcc_comparison) > { > basic_block pred = single_pred (bb); > gimple *last = last_stmt (pred); > if (last > && gimple_code (last) == GIMPLE_COND > && gimple_assign_rhs1 (def1) == > gimple_cond_lhs (last)) > std::swap (op0, op1); > } > } > > doesn't work as the predecessor is no longer in GIMPLE (we dropped the seq > for the GIMPLE stmts and GIMPLE_CONDs have no DEFs...). Also I'm not sure > the half-way RTL CFG will still point to the original block. > > Of course the above heuristic is really only applicable if there's not much > code expanded between this jump and the one in the predecessor. > > OTOH if we have the BRANCH_COST check during RTL expansion (similar to > what we have for LOGICAL_OP_NON_SHORT_CIRCUIT in fold-const.c) then maybe > if-combining shouldn't combine the conditionals. There's a slight disconnect > here, the above is BRANCH_COST < 4 while the other is BRANCH_COST >= 2 ... > > The cfgexpand code could also be done as a pre-pass on the IL turning > the straight-line code back to CFG (I guess that's a good idea anyway). Well, we could have some early expansion mini-pass or very late pass that would try to detect these cases guarded with the if (BRANCH_COST (optimize_bb_for_speed_p (bb), false) < 4) condition, in particular, bb with single pred ending with: if (something op const) and afterwards boolean BIT_{AND,IOR}_EXPR where the bb contains no (non-virtual) phis and only TERed stmts before the BIT_{AND,IOR}_EXPR and if the second of the conditions in there is something op const, swap the operands of the BIT_{AND,IOR}_EXPR. That way at actual expansion time when the predecessor is already converted into BB_RTL form we don't need to give up. I guess the important question is if that actually helps this benchmark. On the #c4 testcase if I compile with -Ofast -mavx2, I get: .p2align 4,,10 .p2align 3 .L10: cmpl $2, %esi jne .L9 testq %r8, %r8 jg .L11 .p2align 4,,10 .p2align 3 .L9: addq %r9, %rax cmpq %rax, %rdx jbe .L31 .L12: movl 24(%rax), %esi testl %esi, %esi jle .L9 movq (%rax), %r8 testq %r8, %r8 jns .L10 cmpl $1, %esi jne .L9 .L11: addq $1, %r10 movq st5(,%r10,8), %rsi movq %rax, (%rsi) addq %r9, %rax movq %r8, 8(%rsi) movq st5(,%rdi,8), %rsi movq %r8, 16(%rsi) cmpq %rax, %rdx ja .L12 while if I in the debugger swap op0 and op1 on the above #c9 spot, I get: .p2align 4,,10 .p2align 3 .L10: jle .L9 cmpl $2, %esi je .L11 .p2align 4,,10 .p2align 3 .L9: addq %r9, %rax cmpq %rax, %rdx jbe .L31 .L12: movl 24(%rax), %esi testl %esi, %esi jle .L9 movq (%rax), %r8 testq %r8, %r8 jns .L10 cmpl $1, %esi jne .L9 .L11: addq $1, %r10 movq st5(,%r10,8), %rsi movq %rax, (%rsi) addq %r9, %rax movq %r8, 8(%rsi) movq st5(,%rdi,8), %rsi movq %r8, 16(%rsi) cmpq %rax, %rdx ja .L12 instead, difference being: .L10: + jle .L9 cmpl $2, %esi - jne .L9 - testq %r8, %r8 - jg .L11 + je .L11 It even wouldn't need to be a separate walk over all bbs, just a function that would do this given a bb that would do something if both that bb and its single_pred are still not & BB_RTL, and do that both from reorder_operands and from expand_gimple_basic_block with EDGE_TRUE/EDGE_FALSE successors for the successors. The only issue is that reorder_operands might actually undo that if the operand we swap as second seems to be more expensive. Another possibility is to remember the old gimple seq of the basic blocks somewhere (array indexed by bb number) and only throw it away after all bbs are expanded.