On Tue, Dec 16, 2014 at 4:15 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: > Hi Richard, > > Here is updated patch which includes > (1) split critical edges for aggressive if conversion. > (2) delete all stuff related to support of critical edge predication. > (3) only one function - predicate_scalar_phi performs predication. > (4) function find_phi_replacement_condition was deleted since it was > included in predicate_scalar_phi for phi with two arguments. > > I checked that patch works in stress testing mode, i.e. with > aggressive if conversion by default. > > What is your opinion?
Looks ok overall, but please simply do FOR_EACH_EDGE (e, ei, bb->succs) if (EDGE_CRITICAL_P (e) && e->dest->loop_father == loop) split_edge (e); for all blocks apart from the latch. Can you please send a combined patch up to this one? Looking at the incremental diff is somewhat hard. Thus a patch including all patches from patch1 to this one. Thanks, Richard. > > Thanks. > Yuri. > > 2014-12-11 11:59 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >> On Wed, Dec 10, 2014 at 4:22 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>> Richard, >>> >>> Thanks for your reply! >>> >>> I didn't understand your point: >>> >>> Well, I don't mind splitting all critical edges unconditionally >>> >>> but you do it unconditionally in proposed patch. >> >> I don't mind means I am fine with it. >> >>> Also I assume that >>> call of split_critical_edges() can break ssa. For example, we can >>> split headers of loops, loop exit blocks etc. >> >> How does that "break SSA"? You mean loop-closed SSA? I'd >> be surprised if so but that may be possible. >> >>> I prefer to do something >>> more loop-specialized, e.g. call edge_split() for critical edges >>> outgoing from bb ending with GIMPLE_COND stmt (assuming that edge >>> destination bb belongs to loop). >> >> That works for me as well but it is more complicated to implement. >> Ideally you'd only split one edge if you find a block with only critical >> predecessors (where we'd currently give up). But note that this >> requires re-computation of ifc_bbs in if_convertible_loop_p_1 and it >> will change loop->num_nodes so we have to be more careful in >> constructing the loop calling if_convertible_bb_p. >> >> Richard. >> >>> >>> 2014-12-10 17:31 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>> On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev <ysrum...@gmail.com> >>>> wrote: >>>>> Richard, >>>>> >>>>> Sorry that I forgot to delete debug dump from my fix. >>>>> I have few questions about your comments. >>>>> >>>>> 1. You wrote : >>>>>> You also still have two functions for PHI predication. And the >>>>>> new extended variant doesn't commonize the 2-args and general >>>>>> path >>>>> Did you mean that I must combine predicate_scalar_phi and >>>>> predicate_extended scalar phi to one function? >>>>> Please note that if additional flag was not set up (i.e. >>>>> aggressive_if_conv is false) extended predication is required more >>>>> compile time since it builds hash_map. >>>> >>>> It's compile-time complexity is reasonable enough even for >>>> non-aggressive if-conversion. >>>> >>>>> 2. About critical edge splitting. >>>>> >>>>> Did you mean that we should perform it (1) under aggressive_if_conv >>>>> option only; (2) should we split all critical edges. >>>>> Note that this leads to recomputing of topological order. >>>> >>>> Well, I don't mind splitting all critical edges unconditionally, thus >>>> do something like >>>> >>>> Index: gcc/tree-if-conv.c >>>> =================================================================== >>>> --- gcc/tree-if-conv.c (revision 218515) >>>> +++ gcc/tree-if-conv.c (working copy) >>>> @@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f >>>> if (number_of_loops (fun) <= 1) >>>> return 0; >>>> >>>> + bool critical_edges_split_p = false; >>>> FOR_EACH_LOOP (loop, 0) >>>> if (flag_tree_loop_if_convert == 1 >>>> || flag_tree_loop_if_convert_stores == 1 >>>> || ((flag_tree_loop_vectorize || loop->force_vectorize) >>>> && !loop->dont_vectorize)) >>>> - todo |= tree_if_conversion (loop); >>>> + { >>>> + if (!critical_edges_split_p) >>>> + { >>>> + split_critical_edges (); >>>> + critical_edges_split_p = true; >>>> + todo |= TODO_cleanup_cfg; >>>> + } >>>> + todo |= tree_if_conversion (loop); >>>> + } >>>> >>>> #ifdef ENABLE_CHECKING >>>> { >>>> >>>>> It is worth noting that in current implementation bb's with 2 >>>>> predecessors and both are on critical edges are accepted without >>>>> additional option. >>>> >>>> Yes, I know. >>>> >>>> tree-if-conv.c is a mess right now and if we can avoid adding more >>>> to it and even fix the critical edge missed optimization with splitting >>>> critical edges then I am all for that solution. >>>> >>>> Richard. >>>> >>>>> Thanks ahead. >>>>> Yuri. >>>>> 2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>>>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>> wrote: >>>>>>> Richard, >>>>>>> >>>>>>> Here is updated patch2 with the following changes: >>>>>>> 1. Delete functions phi_has_two_different_args and >>>>>>> find_insertion_point. >>>>>>> 2. Use only one function for extended predication - >>>>>>> predicate_extended_scalar_phi. >>>>>>> 3. Save gsi before insertion of predicate computations for basic >>>>>>> blocks if it has 2 predecessors and >>>>>>> both incoming edges are critical or it gas more than 2 predecessors >>>>>>> and at least one incoming edge >>>>>>> is critical. This saved iterator can be used by extended phi >>>>>>> predication. >>>>>>> >>>>>>> Here is motivated test-case which explains this point. >>>>>>> Test-case is attached (t5.c) and it must be compiled with -O2 >>>>>>> -ftree-loop-vectorize -fopenmp options. >>>>>>> The problem phi is in bb-7: >>>>>>> >>>>>>> bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 }) >>>>>>> { >>>>>>> <bb 5>: >>>>>>> xmax_edge_18 = xmax_edge_36 + 1; >>>>>>> if (xmax_17 == xmax_27) >>>>>>> goto <bb 7>; >>>>>>> else >>>>>>> goto <bb 9>; >>>>>>> >>>>>>> } >>>>>>> bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 }) >>>>>>> { >>>>>>> <bb 6>: >>>>>>> if (xmax_17 == xmax_27) >>>>>>> goto <bb 7>; >>>>>>> else >>>>>>> goto <bb 8>; >>>>>>> >>>>>>> } >>>>>>> bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 }) >>>>>>> { >>>>>>> <bb 7>: >>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)> >>>>>>> xmax_edge_19 = xmax_edge_39 + 1; >>>>>>> goto <bb 11>; >>>>>>> >>>>>>> } >>>>>>> >>>>>>> Note that both incoming edges to bb_7 are critical. If we comment out >>>>>>> restoring gsi in predicate_all_scalar_phi: >>>>>>> #if 0 >>>>>>> if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb)) >>>>>>> || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb))) >>>>>>> gsi = bb_insert_point (bb); >>>>>>> else >>>>>>> #endif >>>>>>> gsi = gsi_after_labels (bb); >>>>>>> >>>>>>> we will get ICE: >>>>>>> t5.c: In function 'foo': >>>>>>> t5.c:9:6: error: definition in block 4 follows the use >>>>>>> void foo (int n) >>>>>>> ^ >>>>>>> for SSA_NAME: _1 in statement: >>>>>>> _52 = _1 & _3; >>>>>>> t5.c:9:6: internal compiler error: verify_ssa failed >>>>>>> >>>>>>> smce predicate computations were inserted in bb_7. >>>>>> >>>>>> The issue is obviously that the predicates have already been emitted >>>>>> in the target BB - that's of course the wrong place. This is done >>>>>> by insert_gimplified_predicates. >>>>>> >>>>>> This just shows how edge predicate handling is broken - we don't >>>>>> seem to have a sequence of gimplified stmts for edge predicates >>>>>> but push those to e->dest which makes this really messy. >>>>>> >>>>>> Rather than having a separate phase where we insert all >>>>>> gimplified bb predicates we should do that on-demand when >>>>>> predicating a PHI. >>>>>> >>>>>> Your patch writes to stderr - that's bad - use dump_file and guard >>>>>> the printfs properly. >>>>>> >>>>>> You also still have two functions for PHI predication. And the >>>>>> new extended variant doesn't commonize the 2-args and general >>>>>> paths. >>>>>> >>>>>> I'm not at all happy with this code. It may be existing if-conv codes >>>>>> fault but making it even worse is not an option. >>>>>> >>>>>> Again - what's wrong with simply splitting critical edges if >>>>>> aggressive_if_conv? I think that would very much simplify >>>>>> things here. Or alternatively use gsi_insert_on_edge and >>>>>> commit edge insertions before merging the blocks. >>>>>> >>>>>> Thanks, >>>>>> Richard. >>>>>> >>>>>>> ChangeLog is >>>>>>> >>>>>>> 2014-12-09 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>> >>>>>>> * tree-if-conv.c : Include hash-map.h. >>>>>>> (struct bb_predicate_s): Add new field to save copy of gimple >>>>>>> statement iterator. >>>>>>> (bb_insert_point): New function. >>>>>>> (set_bb_insert_point): New function. >>>>>>> (has_pred_critical_p): New function. >>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if >>>>>>> AGGRESSIVE_IF_CONV is true. >>>>>>> (if_convertible_bb_p): Delete check that bb has at least one >>>>>>> non-critical incoming edge. >>>>>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED. >>>>>>> Allow interchange PHI arguments if EXTENDED is false. >>>>>>> Change check that block containing reduction statement candidate >>>>>>> is predecessor of phi-block since phi may have more than two arguments. >>>>>>> (predicate_scalar_phi): Add new arguments for call of >>>>>>> is_cond_scalar_reduction. >>>>>>> (get_predicate_for_edge): New function. >>>>>>> (struct phi_args_hash_traits): New type. >>>>>>> (phi_args_hash_traits::hash): New function. >>>>>>> (phi_args_hash_traits::equal_keys): New function. >>>>>>> (gen_phi_arg_condition): New function. >>>>>>> (predicate_extended_scalar_phi): New function. >>>>>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it >>>>>>> to true if BB containing phi has more than 2 predecessors or both >>>>>>> incoming edges are critical. Invoke find_phi_replacement_condition and >>>>>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB >>>>>>> has 2 predecessors and both incoming edges are critical or it has more >>>>>>> than 2 predecessors and atleast one incoming edge is critical. >>>>>>> Use standard gsi_after_labels otherwise. >>>>>>> Invoke predicate_extended_scalar_phi if EXTENDED is true. >>>>>>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION >>>>>>> to save gsi before insertion of predicate computations. SEt-up it to >>>>>>> true for BB with 2 predecessors and critical incoming edges either >>>>>>> number of predecessors is geater 2 and at least one incoming >>>>>>> edge is >>>>>>> critical. >>>>>>> Add check that non-predicated block may have statements to insert. >>>>>>> Insert predicate computation of BB just after label if >>>>>>> EXTENDED_PREDICATION is true. >>>>>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which >>>>>>> is copy of inner or outer loop force_vectorize field. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>>>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>> wrote: >>>>>>>>> Richard, >>>>>>>>> >>>>>>>>> I did simple change by saving gsi iterator for each bb that has >>>>>>>>> critical edges by adding additional field to bb_predicate_s: >>>>>>>>> >>>>>>>>> typedef struct bb_predicate_s { >>>>>>>>> >>>>>>>>> /* The condition under which this basic block is executed. */ >>>>>>>>> tree predicate; >>>>>>>>> >>>>>>>>> /* PREDICATE is gimplified, and the sequence of statements is >>>>>>>>> recorded here, in order to avoid the duplication of computations >>>>>>>>> that occur in previous conditions. See PR44483. */ >>>>>>>>> gimple_seq predicate_gimplified_stmts; >>>>>>>>> >>>>>>>>> /* Insertion point for blocks having incoming critical edges. */ >>>>>>>>> gimple_stmt_iterator gsi; >>>>>>>>> } *bb_predicate_p; >>>>>>>>> >>>>>>>>> and this iterator is saved in insert_gimplified_predicates before >>>>>>>>> insertion code for predicate computation. I checked that this fix >>>>>>>>> works. >>>>>>>> >>>>>>>> Huh? I still wonder what the issue is with inserting everything >>>>>>>> after the PHI we predicate. >>>>>>>> >>>>>>>> Well, your updated patch will come with testcases for the testsuite >>>>>>>> that will hopefully fail if doing that. >>>>>>>> >>>>>>>> Richard. >>>>>>>> >>>>>>>>> >>>>>>>>> Now I am implementing merging of predicate_extended.. and >>>>>>>>> predicate_arbitrary.. functions as you proposed. >>>>>>>>> >>>>>>>>> Best regards. >>>>>>>>> Yuri. >>>>>>>>> >>>>>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener >>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>>> Thanks Richard for your quick reply! >>>>>>>>>>> >>>>>>>>>>> 1. I agree that we can combine predicate_extended_ and >>>>>>>>>>> predicate_arbitrary_ to one function as you proposed. >>>>>>>>>>> 2. What is your opinion about using more simple decision about >>>>>>>>>>> insertion point - if bb has use of phi result insert phi predication >>>>>>>>>>> before it and at the bb end otherwise. I assume that critical edge >>>>>>>>>>> splitting is not a good decision. >>>>>>>>>> >>>>>>>>>> Why not always insert before the use? Which would be after labels, >>>>>>>>>> what we do for two-arg PHIs. That is, how can it be that you >>>>>>>>>> predicate >>>>>>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming >>>>>>>>>> edges you get SSA uses with defs that are in BB1 itself? That >>>>>>>>>> can only happen for backedges but those you can't remove in any case. >>>>>>>>>> >>>>>>>>>> Richard. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Best regards. >>>>>>>>>>> Yuri. >>>>>>>>>>> >>>>>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener >>>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev >>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>> Hi Richard, >>>>>>>>>>>>> >>>>>>>>>>>>> I resend you patch1 and patch2 with minor changes: >>>>>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv. >>>>>>>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge. >>>>>>>>>>>>> I also very sorry that I sent you bad patch. >>>>>>>>>>>>> >>>>>>>>>>>>> Now let me answer on your questions related to second patch. >>>>>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and >>>>>>>>>>>>> predicate_arbitrary_scalar_phi? >>>>>>>>>>>>> >>>>>>>>>>>>> Let's consider the following simple test-case: >>>>>>>>>>>>> >>>>>>>>>>>>> #pragma omp simd safelen(8) >>>>>>>>>>>>> for (i=0; i<512; i++) >>>>>>>>>>>>> { >>>>>>>>>>>>> float t = a[i]; >>>>>>>>>>>>> if (t > 0.0f & t < 1.0e+17f) >>>>>>>>>>>>> if (c[i] != 0) /* c is integer array. */ >>>>>>>>>>>>> res += 1; >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> we can see the following phi node correspondent to res: >>>>>>>>>>>>> >>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)> >>>>>>>>>>>>> >>>>>>>>>>>>> It is clear that we can optimize it to phi node with 2 arguments >>>>>>>>>>>>> only >>>>>>>>>>>>> and only one check can be used for phi predication (for reduction >>>>>>>>>>>>> in >>>>>>>>>>>>> our case), namely predicate of bb_5. In general case we can't do >>>>>>>>>>>>> it >>>>>>>>>>>>> even if we sort all phi argument values since we still have to >>>>>>>>>>>>> produce >>>>>>>>>>>>> a chain of cond expressions to perform phi predication (see >>>>>>>>>>>>> comments >>>>>>>>>>>>> for predicate_arbitrary_scalar_phi). >>>>>>>>>>>> >>>>>>>>>>>> How so? We can always use !(condition) for the "last" value, thus >>>>>>>>>>>> treat it as an 'else' case. That even works for >>>>>>>>>>>> >>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)> >>>>>>>>>>>> >>>>>>>>>>>> where the condition for edges 5 and 7 can be computed as >>>>>>>>>>>> ! (condition for 3 || condition for 4). >>>>>>>>>>>> >>>>>>>>>>>> Of course it is worthwhile to also sort single-occurances first >>>>>>>>>>>> so your case gets just the condiiton for edge 5 and its inversion >>>>>>>>>>>> used for edges 3 and 4 combined. >>>>>>>>>>>> >>>>>>>>>>>>> 2. Why we need to introduce find_insertion_point? >>>>>>>>>>>>> Let's consider another test-case extracted from 175.vpr ( t5.c is >>>>>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes >>>>>>>>>>>>> has >>>>>>>>>>>>> only critical incoming edges and both contain code computing edge >>>>>>>>>>>>> predicates, e.g. >>>>>>>>>>>>> >>>>>>>>>>>>> <bb 7>: >>>>>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)> >>>>>>>>>>>>> _46 = xmax_17 == xmax_37; >>>>>>>>>>>>> _47 = xmax_17 == xmax_27; >>>>>>>>>>>>> _48 = _46 & _47; >>>>>>>>>>>>> _53 = xmax_17 == xmax_37; >>>>>>>>>>>>> _54 = ~_53; >>>>>>>>>>>>> _55 = xmax_17 == xmax_27; >>>>>>>>>>>>> _56 = _54 & _55; >>>>>>>>>>>>> _57 = _48 | _56; >>>>>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1; >>>>>>>>>>>>> goto <bb 11>; >>>>>>>>>>>>> >>>>>>>>>>>>> It is evident that we can not put phi predication at the block >>>>>>>>>>>>> beginning but need to put it after predicate computations. >>>>>>>>>>>>> Note also that if there are no critical edges for phi arguments >>>>>>>>>>>>> insertion point will be "after labels" Note also that phi result >>>>>>>>>>>>> can >>>>>>>>>>>>> have use in this block too, so we can't put predication code to >>>>>>>>>>>>> the >>>>>>>>>>>>> block end. >>>>>>>>>>>> >>>>>>>>>>>> So the issue is that predicate insertion for edge predicates does >>>>>>>>>>>> not happen on the edge but somewhere else (generally impossible >>>>>>>>>>>> for critical edges unless you split them). >>>>>>>>>>>> >>>>>>>>>>>> I think I've told you before that I prefer simple solutions to >>>>>>>>>>>> such issues, >>>>>>>>>>>> like splitting the edge! Certainly not involving a function >>>>>>>>>>>> walking >>>>>>>>>>>> GENERIC expressions. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Richard. >>>>>>>>>>>> >>>>>>>>>>>>> Let me know if you still have any questions. >>>>>>>>>>>>> >>>>>>>>>>>>> Best regards. >>>>>>>>>>>>> Yuri. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener >>>>>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev >>>>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Here is the second patch related to extended predication. >>>>>>>>>>>>>>> Few comments which explain a main goal of design. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1. I don't want to insert any critical edge splitting since it >>>>>>>>>>>>>>> may >>>>>>>>>>>>>>> lead to less efficient binaries. >>>>>>>>>>>>>>> 2. One special case of extended PHI node predication was >>>>>>>>>>>>>>> introduced >>>>>>>>>>>>>>> when #arguments is more than 2 but only two arguments are >>>>>>>>>>>>>>> different >>>>>>>>>>>>>>> and one argument has the only occurrence. For such PHI >>>>>>>>>>>>>>> conditional >>>>>>>>>>>>>>> scalar reduction is applied. >>>>>>>>>>>>>>> This is correspondent to the following statement: >>>>>>>>>>>>>>> if (q1 && q2 && q3) var++ >>>>>>>>>>>>>>> New function phi_has_two_different_args was introduced to >>>>>>>>>>>>>>> detect such phi. >>>>>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption that >>>>>>>>>>>>>>> at >>>>>>>>>>>>>>> least one incoming edge for blocks containing PHI is not >>>>>>>>>>>>>>> critical - it >>>>>>>>>>>>>>> guarantees that all computations related to predicate of normal >>>>>>>>>>>>>>> edge >>>>>>>>>>>>>>> are already inserted above this block and >>>>>>>>>>>>>>> code related to PHI predication can be inserted at the >>>>>>>>>>>>>>> beginning of >>>>>>>>>>>>>>> block. But this is not true for critical edges for which >>>>>>>>>>>>>>> predicate >>>>>>>>>>>>>>> computations are in the block where code for phi predication >>>>>>>>>>>>>>> must be >>>>>>>>>>>>>>> inserted. So new function find_insertion_point is introduced >>>>>>>>>>>>>>> which is >>>>>>>>>>>>>>> simply found out the last statement in block defining predicates >>>>>>>>>>>>>>> correspondent to all incoming edges and insert phi predication >>>>>>>>>>>>>>> code >>>>>>>>>>>>>>> after it (with some minor exceptions). >>>>>>>>>>>>>> >>>>>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get >>>>>>>>>>>>>> >>>>>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@ >>>>>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop) >>>>>>>>>>>>>> >>>>>>>>>>>>>> a few remarks nevertheless. I don't see how we need both >>>>>>>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi. >>>>>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after >>>>>>>>>>>>>> value >>>>>>>>>>>>>> and handle equal values specially in >>>>>>>>>>>>>> predicate_extended_scalar_phi? >>>>>>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I don't understand the need for find_insertion_point. All SSA >>>>>>>>>>>>>> names >>>>>>>>>>>>>> required for the predicates are defined upward - and the complex >>>>>>>>>>>>>> CFG >>>>>>>>>>>>>> is squashed to a single basic-block, thus the defs will dominate >>>>>>>>>>>>>> the >>>>>>>>>>>>>> inserted code if you insert after labels just like for the other >>>>>>>>>>>>>> case. >>>>>>>>>>>>>> Or what am I missing? ("flattening" of the basic-blocks of >>>>>>>>>>>>>> course needs >>>>>>>>>>>>>> to happen in dominator order - but I guess that happens already?) >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even >>>>>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args >>>>>>>>>>>>>> multiple >>>>>>>>>>>>>> times that would have been nice to vectorize. I suggest to >>>>>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this. We can do this >>>>>>>>>>>>>> as >>>>>>>>>>>>>> followup, but please rename the local flag_force_vectorize flag >>>>>>>>>>>>>> to something less looking like a flag, like simply 'aggressive'. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Otherwise patch 2 looks ok to me. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Richard. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> ChangeLog: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2014-10-24 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use >>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag. >>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if >>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true. >>>>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one >>>>>>>>>>>>>>> non-critical incoming edge. >>>>>>>>>>>>>>> (phi_has_two_different_args): New function. >>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose >>>>>>>>>>>>>>> access >>>>>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi >>>>>>>>>>>>>>> arguments if EXTENDED is true. Change check that block >>>>>>>>>>>>>>> containing reduction statement candidate is predecessor >>>>>>>>>>>>>>> of phi-block since phi may have more than two arguments. >>>>>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert >>>>>>>>>>>>>>> statement before/after gsi point. >>>>>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means >>>>>>>>>>>>>>> non-extended >>>>>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument >>>>>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of >>>>>>>>>>>>>>> convert_scalar_cond_reduction. >>>>>>>>>>>>>>> (get_predicate_for_edge): New function. >>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function. >>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function. >>>>>>>>>>>>>>> (find_insertion_point): New function. >>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has >>>>>>>>>>>>>>> more >>>>>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke >>>>>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or >>>>>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi >>>>>>>>>>>>>>> depending on >>>>>>>>>>>>>>> EXTENDED value. >>>>>>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated >>>>>>>>>>>>>>> block >>>>>>>>>>>>>>> may have statements to insert. Insert predicate of BB just >>>>>>>>>>>>>>> after label >>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true. >>>>>>>>>>>>>>> (tree_if_conversion): Add initialization of >>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE which >>>>>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.