The patch below increases the number of coalescs we attempt to also cover unary and binary operations. This improves initial code generation for code like
int foo (int i, int j, int k, int l) { int res = i; res += j; res += k; res += l; return res; } from ;; res_3 = i_1(D) + j_2(D); (insn 9 8 0 (parallel [ (set (reg/v:SI 83 [ res ]) (plus:SI (reg/v:SI 87 [ i ]) (reg/v:SI 88 [ j ]))) (clobber (reg:CC 17 flags)) ]) t.c:4 -1 (nil)) ;; res_5 = res_3 + k_4(D); (insn 10 9 0 (parallel [ (set (reg/v:SI 84 [ res ]) (plus:SI (reg/v:SI 83 [ res ]) (reg/v:SI 89 [ k ]))) (clobber (reg:CC 17 flags)) ]) t.c:5 -1 (nil)) ... to ;; res_3 = i_1(D) + j_2(D); (insn 9 8 0 (parallel [ (set (reg/v:SI 83 [ res ]) (plus:SI (reg/v:SI 85 [ i ]) (reg/v:SI 86 [ j ]))) (clobber (reg:CC 17 flags)) ]) t.c:4 -1 (nil)) ;; res_5 = res_3 + k_4(D); (insn 10 9 0 (parallel [ (set (reg/v:SI 83 [ res ]) (plus:SI (reg/v:SI 83 [ res ]) (reg/v:SI 87 [ k ]))) (clobber (reg:CC 17 flags)) ]) t.c:5 -1 (nil)) re-using the same pseudo for the LHS. Expansion has special code to improve coalescing of op1 with target thus this is what we try to match here. Overall there are positive and negative size effects during a bootstrap on x86_64, but overall it seems to be a loss - stage3 cc1 text size is 18261647 bytes without the patch compared to 18265751 bytes with the patch. Now the question is what does this tell us? Not re-using the same pseudo as op and target is always better? Btw, I tried this to find a convincing metric for a intra-BB scheduling pass (during out-of-SSA) on GIMPLE (to be able to kill that odd scheduling code we now have in reassoc). And to have sth that TER not immediately un-does we have to disable TER which conveniently happens for coalesced SSA names. Thus -> schedule for "register pressure", and thus reduce SSA name lifetime - with the goal that out-of-SSA can do more coalescing. But it won't even try to coalesce anything else than PHI copies (not affected by scheduling) or plain SSA name copies (shouldn't happen anyway due to copy propagation). So - any ideas? Or is the overall negative for cc1 just an artifact to ignore and we _should_ coalesce as much as possible (even if it doesn't avoid copies - thus the "cost" of 0 used in the patch)? Otherwise the patch bootstraps and tests fine on x86_64-unknown-linux-gnu. Thanks, Richard. 2014-04-17 Richard Biener <rguent...@suse.de> * tree-ssa-coalesce.c (create_outofssa_var_map): Try to coalesce SSA name uses with SSA name results in all unary and binary operations. Index: gcc/tree-ssa-coalesce.c =================================================================== *** gcc/tree-ssa-coalesce.c (revision 209469) --- gcc/tree-ssa-coalesce.c (working copy) *************** create_outofssa_var_map (coalesce_list_p *** 991,1007 **** case GIMPLE_ASSIGN: { tree lhs = gimple_assign_lhs (stmt); tree rhs1 = gimple_assign_rhs1 (stmt); ! if (gimple_assign_ssa_name_copy_p (stmt) && gimple_can_coalesce_p (lhs, rhs1)) { v1 = SSA_NAME_VERSION (lhs); v2 = SSA_NAME_VERSION (rhs1); ! cost = coalesce_cost_bb (bb); ! add_coalesce (cl, v1, v2, cost); bitmap_set_bit (used_in_copy, v1); bitmap_set_bit (used_in_copy, v2); } } break; --- 993,1031 ---- case GIMPLE_ASSIGN: { tree lhs = gimple_assign_lhs (stmt); + if (TREE_CODE (lhs) != SSA_NAME) + break; + + /* Expansion handles target == op1 properly and also + target == op2 for commutative binary ops. */ tree rhs1 = gimple_assign_rhs1 (stmt); ! enum tree_code code = gimple_assign_rhs_code (stmt); ! enum gimple_rhs_class klass = get_gimple_rhs_class (code); ! if (TREE_CODE (rhs1) == SSA_NAME && gimple_can_coalesce_p (lhs, rhs1)) { v1 = SSA_NAME_VERSION (lhs); v2 = SSA_NAME_VERSION (rhs1); ! add_coalesce (cl, v1, v2, ! klass == GIMPLE_SINGLE_RHS ! ? coalesce_cost_bb (bb) : 0); bitmap_set_bit (used_in_copy, v1); bitmap_set_bit (used_in_copy, v2); } + if (klass == GIMPLE_BINARY_RHS + && commutative_tree_code (code)) + { + tree rhs2 = gimple_assign_rhs2 (stmt); + if (TREE_CODE (rhs2) == SSA_NAME + && gimple_can_coalesce_p (lhs, rhs2)) + { + v1 = SSA_NAME_VERSION (lhs); + v2 = SSA_NAME_VERSION (rhs2); + add_coalesce (cl, v1, v2, 0); + bitmap_set_bit (used_in_copy, v1); + bitmap_set_bit (used_in_copy, v2); + } + } } break;