On Wed, Oct 31, 2012 at 8:16 PM, Easwaran Raman <era...@google.com> wrote: > On Wed, Oct 31, 2012 at 4:36 AM, Richard Biener > <richard.guent...@gmail.com> wrote: >> On Wed, Oct 24, 2012 at 4:02 AM, Easwaran Raman <era...@google.com> wrote: >>> On Tue, Oct 23, 2012 at 2:52 AM, Richard Biener >>> <richard.guent...@gmail.com> wrote: >>>> On Mon, Oct 22, 2012 at 8:31 PM, Easwaran Raman <era...@google.com> wrote: >>>>> On Mon, Oct 22, 2012 at 12:59 AM, Richard Biener >>>>> <richard.guent...@gmail.com> wrote: >>>>>> On Fri, Oct 19, 2012 at 12:36 AM, Easwaran Raman <era...@google.com> >>>>>> wrote: >>>>>>> Hi, >>>>>>> >>>>>>> During expression reassociation, statements are conservatively moved >>>>>>> downwards to ensure that dependences are correctly satisfied after >>>>>>> reassocation. This could lead to lengthening of live ranges. This >>>>>>> patch moves statements only to the extent necessary. Bootstraps and no >>>>>>> test regression on x86_64/linux. OK for trunk? >>>>>>> >>>>>>> Thanks, >>>>>>> Easwaran >>>>>>> >>>>>>> 2012-10-18 Easwaran Raman <era...@google.com> >>>>>>> * tree-ssa-reassoc.c(assign_uids): New function. >>>>>>> (assign_uids_in_relevant_bbs): Likewise. >>>>>>> (ensure_ops_are_available): Likewise. >>>>>>> (rewrite_expr_tree): Do not move statements beyond what is >>>>>>> necessary. Remove call to swap_ops_for_binary_stmt... >>>>>>> (reassociate_bb): ... and move it here. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Index: gcc/tree-ssa-reassoc.c >>>>>>> =================================================================== >>>>>>> --- gcc/tree-ssa-reassoc.c (revision 192487) >>>>>>> +++ gcc/tree-ssa-reassoc.c (working copy) >>>>>>> @@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, >>>>>>> hea >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> +/* Assign UIDs to statements in basic block BB. */ >>>>>>> + >>>>>>> +static void >>>>>>> +assign_uids (basic_block bb) >>>>>>> +{ >>>>>>> + unsigned uid = 0; >>>>>>> + gimple_stmt_iterator gsi; >>>>>>> + /* First assign uids to phis. */ >>>>>>> + for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi)) >>>>>>> + { >>>>>>> + gimple stmt = gsi_stmt (gsi); >>>>>>> + gimple_set_uid (stmt, uid++); >>>>>>> + } >>>>>>> + >>>>>>> + /* Then assign uids to stmts. */ >>>>>>> + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) >>>>>>> + { >>>>>>> + gimple stmt = gsi_stmt (gsi); >>>>>>> + gimple_set_uid (stmt, uid++); >>>>>>> + } >>>>>>> +} >>>>>>> + >>>>>>> +/* For each operand in OPS, find the basic block that contains the >>>>>>> statement >>>>>>> + which defines the operand. For all such basic blocks, assign UIDs. >>>>>>> */ >>>>>>> + >>>>>>> +static void >>>>>>> +assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops) >>>>>>> +{ >>>>>>> + operand_entry_t oe; >>>>>>> + int i; >>>>>>> + struct pointer_set_t *seen_bbs = pointer_set_create (); >>>>>>> + >>>>>>> + for (i = 0; VEC_iterate (operand_entry_t, ops, i, oe); i++) >>>>>>> + { >>>>>>> + gimple def_stmt; >>>>>>> + basic_block bb; >>>>>>> + if (TREE_CODE (oe->op) != SSA_NAME) >>>>>>> + continue; >>>>>>> + def_stmt = SSA_NAME_DEF_STMT (oe->op); >>>>>>> + bb = gimple_bb (def_stmt); >>>>>>> + if (!pointer_set_contains (seen_bbs, bb)) >>>>>>> + { >>>>>>> + assign_uids (bb); >>>>>>> + pointer_set_insert (seen_bbs, bb); >>>>>>> + } >>>>>>> + } >>>>>>> + pointer_set_destroy (seen_bbs); >>>>>>> +} >>>>>> >>>>>> Please assign UIDs once using the existing renumber_gimple_stmt_uids (). >>>>>> You seem to call the above multiple times and thus do work bigger than >>>>>> O(number of basic blocks). >>>>> >>>>> The reason I call the above multiple times is that gsi_move_before >>>>> might get called between two calls to the above. For instance, after >>>>> rewrite_expr_tree is called once, the following sequence of calls >>>>> could happen: >>>>> reassociate_bb -> linearize_expr_tree -> linearize_expr -> >>>>> gsi_move_before. So it is not sufficient to call >>>>> renumber_gimple_stmt_uids once per do_reassoc. Or do you want me to >>>>> use renumber_gimple_stmt_uids_in_blocks instead of >>>>> assign_uids_in_relevant_bbs? >>>> >>>> It's sufficient to call it once if you conservatively update UIDs at stmt >>>> move / insert time (assign the same UID as the stmt before/after). >>> >>> I was worried that that would make the code brittle, but looking at >>> the places where a new statement is added or moved, I think it is >>> fine. I have made the change in the new patch. >> >> Thanks. >> >>>>> >>>>> >>>>>>> +/* Ensure that operands in the OPS vector starting from OPINDEXth >>>>>>> entry are live >>>>>>> + at STMT. This is accomplished by moving STMT if needed. */ >>>>>>> + >>>>>>> +static void >>>>>>> +ensure_ops_are_available (gimple stmt, VEC(operand_entry_t, heap) * >>>>>>> ops, int opindex) >>>>>>> +{ >>>>>>> + int i; >>>>>>> + int len = VEC_length (operand_entry_t, ops); >>>>>>> + gimple insert_stmt = stmt; >>>>>>> + basic_block insert_bb = gimple_bb (stmt); >>>>>>> + gimple_stmt_iterator gsi_insert, gsistmt; >>>>>>> + for (i = opindex; i < len; i++) >>>>>>> + { >>>>>> >>>>>> Likewise you call this for each call to rewrite_expr_tree, so it seems >>>>>> to me >>>>>> this is quadratic in the number of ops in the op vector. >>>>> >>>>> The call to ensure_ops_are_available inside rewrite_expr_tree is >>>>> guarded by if (!moved) and I am setting moved = true there to ensure >>>>> that ensure_ops_are_available inside is called once per reassociation >>>>> of a expression tree. >>>> >>>> It would be a lot easier to understand this function if you call it once >>>> after rewrite_expr_tree finished. >>> I think you meant "before rewrite_expr_tree". One thing to note here >>> is this function is called only when we first see a statement whose >>> operand changes after reassociation. That's the reason the moving of >>> statements happen inside rewrite_expr_tree. >> >> I meant after because rewrite_expr_tree ends up swapping things. But >> I note you changed that in your revised patch - good ;) >> >> So now that rewrite_expr_tree does no longer change order of the ops >> vector we can move stmts indeed before it, but backwards. Thus, >> instead of walking forward and >> >> + /* Ensure that STMT1 is moved to a place where all operands in >> + OPS[opindex + i...] are available. */ >> + ensure_ops_are_available (stmt1, ops, opindex + i); >> >> we can walk backward and only need to consider the immediate next operand >> because we know all further operands are available at the place the immediate >> next operand is defined. >> >> Which should remove the need for the loop in ensure_ops_are_available >> >> + for (i = opindex; i < len; i++) >> + { >> >> Correct? (We may need to special-case the last (two?) stmts). > > Sorry, I am not able to understand your proposal here. The confusion > relates to the use of walking forward vs backward. In the current > patch, I start from the last statement (root of the expression tree) > and follow the chain of defs using SSA_NAME_DEF_STMT (which I consider > as walking backward). Do you want to start from the leaf of the > expression and follow the chain in the opposite direction till the > stmt defining the root of the tree is reached? That would indeed avoid > comparing each statement against all statements that define the ops > array as it is sufficient to compare against the statements that > define the RHS operands. If that is indeed what you are proposing, I > tried that approach and ran into issues. Doing this would mean that > the IR is in an inconsistent state during intermediate stages. For > instance, transforming > > t1 = a + b > t2 = t1 + c > d = ... > t3 = t2 + d > > into > > t1 = a + d > t2 = t1 + c > d = ... > t3 = t2 + b > > would result in a state where t2 = t1 + c appears before t1 = a + b. > I don't know if that is the root cause, but I noticed that doing so > resulted in ssa_verify complaining about DEBUG statements whose > operands appeared before their defs. If this is not what you mean by > walking backwards, could you please explain with a simple example?
Yes, that's what I mean with walking backwards. DEBUG stmts need to be resetted when they are no longer valid. reassoc suffers from many (existing) debug issues. Richard. > Thanks, > Easwaran > > >>>> >>>>>> >>>>>> Why make this all so complicated? It seems to me that we should >>>>>> fixup stmt order only after the whole ops vector has been materialized. >>>>> >>>>> >>>>>> >>>>>>> + operand_entry_t oe = VEC_index (operand_entry_t, ops, i); >>>>>>> + gimple def_stmt; >>>>>>> + basic_block def_bb; >>>>>>> + /* Ignore constants and operands with default definitons. */ >>>>>>> + if (TREE_CODE (oe->op) != SSA_NAME >>>>>>> + || SSA_NAME_IS_DEFAULT_DEF (oe->op)) >>>>>>> + continue; >>>>>>> + def_stmt = SSA_NAME_DEF_STMT (oe->op); >>>>>>> + def_bb = gimple_bb (def_stmt); >>>>>>> + if (def_bb != insert_bb >>>>>>> + && !dominated_by_p (CDI_DOMINATORS, insert_bb, def_bb)) >>>>>>> + { >>>>>>> + insert_bb = def_bb; >>>>>>> + insert_stmt = def_stmt; >>>>>>> + } >>>>>>> + else if (def_bb == insert_bb >>>>>>> + && gimple_uid (insert_stmt) < gimple_uid (def_stmt)) >>>>>>> + insert_stmt = def_stmt; >>>>>>> + } >>>>>>> + if (insert_stmt == stmt) >>>>>>> + return; >>>>>>> + gsistmt = gsi_for_stmt (stmt); >>>>>>> + /* If GSI_STMT is a phi node, then do not insert just after that >>>>>>> statement. >>>>>>> + Instead, find the first non-label gimple statement in BB and >>>>>>> insert before >>>>>>> + that. */ >>>>>>> + if (gimple_code (insert_stmt) == GIMPLE_PHI) >>>>>>> + { >>>>>>> + gsi_insert = gsi_after_labels (insert_bb); >>>>>>> + gsi_move_before (&gsistmt, &gsi_insert); >>>>>>> + } >>>>>>> + /* Statements marked for throw can not be in the middle of a basic >>>>>>> block. So >>>>>>> + we can not insert a statement (not marked for throw) immediately >>>>>>> after. */ >>>>>>> + else if (lookup_stmt_eh_lp (insert_stmt) > 0 >>>>>> >>>>>> that's already performed by stmt_can_throw_internal >>>>> Ok. I was hit by the "statement marked for throw in middle of block" >>>>> error in verify_gimple_in_cfg and simply copied the sequence of >>>>> checks that led to that. >>>>> >>>>>> >>>>>>> + && stmt_can_throw_internal (insert_stmt))SSA_NAME_DEF_STMT >>>>>> >>>>>> But all this should be a non-issue as re-assoc should never assign an ops >>>>>> vector entry for such stmts (but it could have leafs defined by such >>>>>> stmts). >>>>>> If you only ever move definitions down, like the present code does caring >>>>>> about leafs should not be necessary. >>>>> >>>>> I added this as I got an ICE in verify_gimple_in_cfg. So it seems like >>>>> such statements gets assigned into the ops vector. >>>> >>>> That would be a bug. See how linearize_expr_tree checks whether >>>> the defining stmt of an op may throw. That is, it _does_ add the >>>> SSA def to the vector but does not recurse on its defining stmt operands. >>> >>> But as long as the SSA def of a throwing statement is added to the ops >>> vector, I need to add the check, even if it doesn't recurse on its >>> operands. Consider the following >>> >>> <bb1>: >>> t1 = a + b >>> c = ... // c might throw >>> >>> <bb2>: >>> t2 = t1 + c >>> >>> The expression could be rewritten as (a+c) + b. >>> >>> Since c's definition follows t1's, t1 = a+b has to be moved after c = >>> ... That's why I need the check. >> >> Hmm, indeed. >> >> + /* There should be exactly one normal edge out of the basic block. */ >> + FOR_EACH_EDGE (e, ei, insert_bb->succs) >> + { >> + if (!(e->flags & EDGE_COMPLEX)) >> + { >> + gcc_assert (succ_edge == NULL); >> + succ_edge = e; >> + } >> + } >> >> I think we have sth like find_fallthru_edge for this. >> >> Thanks, >> Richard. >> >>> Thanks, >>> Easwaran >>> >>>> Richard. >>>> >>>>> Thanks, >>>>> Easwaran >>>>>> >>>>>> Richard. >>>>>> >>>>>>> + { >>>>>>> + edge e, succ_edge = NULL; >>>>>>> + edge_iterator ei; >>>>>>> + >>>>>>> + /* There should be exactly one normal edge out of the basic >>>>>>> block. */ >>>>>>> + FOR_EACH_EDGE (e, ei, insert_bb->succs) >>>>>>> + { >>>>>>> + if (!(e->flags & EDGE_COMPLEX)) >>>>>>> + { >>>>>>> + gcc_assert (succ_edge == NULL); >>>>>>> + succ_edge = e; >>>>>>> + } >>>>>>> + } >>>>>>> + /* Insert STMT at the beginning of the successor basic block. */ >>>>>>> + insert_bb = succ_edge->dest; >>>>>>> + gsi_insert = gsi_after_labels (insert_bb); >>>>>>> + gsi_move_before (&gsistmt, &gsi_insert); >>>>>>> + } >>>>>>> + else >>>>>>> + { >>>>>>> + gsi_insert = gsi_for_stmt (insert_stmt); >>>>>>> + gsi_move_after (&gsistmt, &gsi_insert); >>>>>>> + } >>>>>>> + >>>>>>> +} >>>>>>> + >>>>>>> /* Recursively rewrite our linearized statements so that the operators >>>>>>> match those in OPS[OPINDEX], putting the computation in rank >>>>>>> order. */ >>>>>>> @@ -2262,11 +2384,6 @@ rewrite_expr_tree (gimple stmt, unsigned int >>>>>>> opind >>>>>>> tree rhs2 = gimple_assign_rhs2 (stmt); >>>>>>> operand_entry_t oe; >>>>>>> >>>>>>> - /* If we have three operands left, then we want to make sure the ones >>>>>>> - that get the double binary op are chosen wisely. */ >>>>>>> - if (opindex + 3 == VEC_length (operand_entry_t, ops)) >>>>>>> - swap_ops_for_binary_stmt (ops, opindex, stmt); >>>>>>> - >>>>>>> /* The final recursion case for this function is that you have >>>>>>> exactly two operations left. >>>>>>> If we had one exactly one op in the entire list to start with, we >>>>>>> @@ -2312,19 +2429,17 @@ rewrite_expr_tree (gimple stmt, unsigned int >>>>>>> opind >>>>>>> { >>>>>>> if (!moved) >>>>>>> { >>>>>>> - gimple_stmt_iterator gsinow, gsirhs1; >>>>>>> - gimple stmt1 = stmt, stmt2; >>>>>>> - unsigned int count; >>>>>>> + gimple stmt1 = stmt; >>>>>>> + unsigned int count, i = 1; >>>>>>> >>>>>>> - gsinow = gsi_for_stmt (stmt); >>>>>>> count = VEC_length (operand_entry_t, ops) - opindex - 2; >>>>>>> - while (count-- != 0) >>>>>>> + while (i <= count) >>>>>>> { >>>>>>> - stmt2 = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (stmt1)); >>>>>>> - gsirhs1 = gsi_for_stmt (stmt2); >>>>>>> - gsi_move_before (&gsirhs1, &gsinow); >>>>>>> - gsi_prev (&gsinow); >>>>>>> - stmt1 = stmt2; >>>>>>> + stmt1 = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (stmt1)); >>>>>>> + /* Ensure that STMT1 is moved to a place where all >>>>>>> operands in >>>>>>> + OPS[opindex + i...] are available. */ >>>>>>> + ensure_ops_are_available (stmt1, ops, opindex + i); >>>>>>> + i++; >>>>>>> } >>>>>>> moved = true; >>>>>>> } >>>>>>> @@ -3542,8 +3657,18 @@ reassociate_bb (basic_block bb) >>>>>>> && VEC_length (operand_entry_t, ops) > 3) >>>>>>> rewrite_expr_tree_parallel (stmt, width, ops); >>>>>>> else >>>>>>> - rewrite_expr_tree (stmt, 0, ops, false); >>>>>>> + { >>>>>>> + /* When there are three operands left, we want >>>>>>> + to make sure the ones that get the double >>>>>>> + binary op are chosen wisely. */ >>>>>>> + int len = VEC_length (operand_entry_t, ops); >>>>>>> + if (len >= 3) >>>>>>> + swap_ops_for_binary_stmt (ops, len - 3, stmt); >>>>>>> >>>>>>> + assign_uids_in_relevant_bbs (ops); >>>>>>> + rewrite_expr_tree (stmt, 0, ops, false); >>>>>>> + } >>>>>>> + >>>>>>> /* If we combined some repeated factors into a >>>>>>> __builtin_powi call, multiply that result by the >>>>>>> reassociated operands. */