Richard, Here is updated patch2 with the following changes: 1. Delete functions phi_has_two_different_args and find_insertion_point. 2. Use only one function for extended predication - predicate_extended_scalar_phi. 3. Save gsi before insertion of predicate computations for basic blocks if it has 2 predecessors and both incoming edges are critical or it gas more than 2 predecessors and at least one incoming edge is critical. This saved iterator can be used by extended phi predication.
Here is motivated test-case which explains this point. Test-case is attached (t5.c) and it must be compiled with -O2 -ftree-loop-vectorize -fopenmp options. The problem phi is in bb-7: bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 }) { <bb 5>: xmax_edge_18 = xmax_edge_36 + 1; if (xmax_17 == xmax_27) goto <bb 7>; else goto <bb 9>; } bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 }) { <bb 6>: if (xmax_17 == xmax_27) goto <bb 7>; else goto <bb 8>; } bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 }) { <bb 7>: # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)> xmax_edge_19 = xmax_edge_39 + 1; goto <bb 11>; } Note that both incoming edges to bb_7 are critical. If we comment out restoring gsi in predicate_all_scalar_phi: #if 0 if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb)) || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb))) gsi = bb_insert_point (bb); else #endif gsi = gsi_after_labels (bb); we will get ICE: t5.c: In function 'foo': t5.c:9:6: error: definition in block 4 follows the use void foo (int n) ^ for SSA_NAME: _1 in statement: _52 = _1 & _3; t5.c:9:6: internal compiler error: verify_ssa failed smce predicate computations were inserted in bb_7. ChangeLog is 2014-12-09 Yuri Rumyantsev <ysrum...@gmail.com> * tree-if-conv.c : Include hash-map.h. (struct bb_predicate_s): Add new field to save copy of gimple statement iterator. (bb_insert_point): New function. (set_bb_insert_point): New function. (has_pred_critical_p): New function. (if_convertible_bb_p): Allow bb has more than 2 predecessors if AGGRESSIVE_IF_CONV is true. (if_convertible_bb_p): Delete check that bb has at least one non-critical incoming edge. (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED. Allow interchange PHI arguments if EXTENDED is false. Change check that block containing reduction statement candidate is predecessor of phi-block since phi may have more than two arguments. (predicate_scalar_phi): Add new arguments for call of is_cond_scalar_reduction. (get_predicate_for_edge): New function. (struct phi_args_hash_traits): New type. (phi_args_hash_traits::hash): New function. (phi_args_hash_traits::equal_keys): New function. (gen_phi_arg_condition): New function. (predicate_extended_scalar_phi): New function. (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it to true if BB containing phi has more than 2 predecessors or both incoming edges are critical. Invoke find_phi_replacement_condition and predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB has 2 predecessors and both incoming edges are critical or it has more than 2 predecessors and atleast one incoming edge is critical. Use standard gsi_after_labels otherwise. Invoke predicate_extended_scalar_phi if EXTENDED is true. (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION to save gsi before insertion of predicate computations. SEt-up it to true for BB with 2 predecessors and critical incoming edges either number of predecessors is geater 2 and at least one incoming edge is critical. Add check that non-predicated block may have statements to insert. Insert predicate computation of BB just after label if EXTENDED_PREDICATION is true. (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which is copy of inner or outer loop force_vectorize field. 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: > On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >> Richard, >> >> I did simple change by saving gsi iterator for each bb that has >> critical edges by adding additional field to bb_predicate_s: >> >> typedef struct bb_predicate_s { >> >> /* The condition under which this basic block is executed. */ >> tree predicate; >> >> /* PREDICATE is gimplified, and the sequence of statements is >> recorded here, in order to avoid the duplication of computations >> that occur in previous conditions. See PR44483. */ >> gimple_seq predicate_gimplified_stmts; >> >> /* Insertion point for blocks having incoming critical edges. */ >> gimple_stmt_iterator gsi; >> } *bb_predicate_p; >> >> and this iterator is saved in insert_gimplified_predicates before >> insertion code for predicate computation. I checked that this fix >> works. > > Huh? I still wonder what the issue is with inserting everything > after the PHI we predicate. > > Well, your updated patch will come with testcases for the testsuite > that will hopefully fail if doing that. > > Richard. > >> >> Now I am implementing merging of predicate_extended.. and >> predicate_arbitrary.. functions as you proposed. >> >> Best regards. >> Yuri. >> >> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>>> Thanks Richard for your quick reply! >>>> >>>> 1. I agree that we can combine predicate_extended_ and >>>> predicate_arbitrary_ to one function as you proposed. >>>> 2. What is your opinion about using more simple decision about >>>> insertion point - if bb has use of phi result insert phi predication >>>> before it and at the bb end otherwise. I assume that critical edge >>>> splitting is not a good decision. >>> >>> Why not always insert before the use? Which would be after labels, >>> what we do for two-arg PHIs. That is, how can it be that you predicate >>> a PHI in BB1 and then for an edge predicate on one of its incoming >>> edges you get SSA uses with defs that are in BB1 itself? That >>> can only happen for backedges but those you can't remove in any case. >>> >>> Richard. >>> >>>> >>>> Best regards. >>>> Yuri. >>>> >>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>> wrote: >>>>>> Hi Richard, >>>>>> >>>>>> I resend you patch1 and patch2 with minor changes: >>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv. >>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge. >>>>>> I also very sorry that I sent you bad patch. >>>>>> >>>>>> Now let me answer on your questions related to second patch. >>>>>> 1. Why we need both predicate_extended_scalar_phi and >>>>>> predicate_arbitrary_scalar_phi? >>>>>> >>>>>> Let's consider the following simple test-case: >>>>>> >>>>>> #pragma omp simd safelen(8) >>>>>> for (i=0; i<512; i++) >>>>>> { >>>>>> float t = a[i]; >>>>>> if (t > 0.0f & t < 1.0e+17f) >>>>>> if (c[i] != 0) /* c is integer array. */ >>>>>> res += 1; >>>>>> } >>>>>> >>>>>> we can see the following phi node correspondent to res: >>>>>> >>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)> >>>>>> >>>>>> It is clear that we can optimize it to phi node with 2 arguments only >>>>>> and only one check can be used for phi predication (for reduction in >>>>>> our case), namely predicate of bb_5. In general case we can't do it >>>>>> even if we sort all phi argument values since we still have to produce >>>>>> a chain of cond expressions to perform phi predication (see comments >>>>>> for predicate_arbitrary_scalar_phi). >>>>> >>>>> How so? We can always use !(condition) for the "last" value, thus >>>>> treat it as an 'else' case. That even works for >>>>> >>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)> >>>>> >>>>> where the condition for edges 5 and 7 can be computed as >>>>> ! (condition for 3 || condition for 4). >>>>> >>>>> Of course it is worthwhile to also sort single-occurances first >>>>> so your case gets just the condiiton for edge 5 and its inversion >>>>> used for edges 3 and 4 combined. >>>>> >>>>>> 2. Why we need to introduce find_insertion_point? >>>>>> Let's consider another test-case extracted from 175.vpr ( t5.c is >>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has >>>>>> only critical incoming edges and both contain code computing edge >>>>>> predicates, e.g. >>>>>> >>>>>> <bb 7>: >>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)> >>>>>> _46 = xmax_17 == xmax_37; >>>>>> _47 = xmax_17 == xmax_27; >>>>>> _48 = _46 & _47; >>>>>> _53 = xmax_17 == xmax_37; >>>>>> _54 = ~_53; >>>>>> _55 = xmax_17 == xmax_27; >>>>>> _56 = _54 & _55; >>>>>> _57 = _48 | _56; >>>>>> xmax_edge_19 = xmax_edge_39 + 1; >>>>>> goto <bb 11>; >>>>>> >>>>>> It is evident that we can not put phi predication at the block >>>>>> beginning but need to put it after predicate computations. >>>>>> Note also that if there are no critical edges for phi arguments >>>>>> insertion point will be "after labels" Note also that phi result can >>>>>> have use in this block too, so we can't put predication code to the >>>>>> block end. >>>>> >>>>> So the issue is that predicate insertion for edge predicates does >>>>> not happen on the edge but somewhere else (generally impossible >>>>> for critical edges unless you split them). >>>>> >>>>> I think I've told you before that I prefer simple solutions to such >>>>> issues, >>>>> like splitting the edge! Certainly not involving a function walking >>>>> GENERIC expressions. >>>>> >>>>> Thanks, >>>>> Richard. >>>>> >>>>>> Let me know if you still have any questions. >>>>>> >>>>>> Best regards. >>>>>> Yuri. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>> wrote: >>>>>>>> Hi All, >>>>>>>> >>>>>>>> Here is the second patch related to extended predication. >>>>>>>> Few comments which explain a main goal of design. >>>>>>>> >>>>>>>> 1. I don't want to insert any critical edge splitting since it may >>>>>>>> lead to less efficient binaries. >>>>>>>> 2. One special case of extended PHI node predication was introduced >>>>>>>> when #arguments is more than 2 but only two arguments are different >>>>>>>> and one argument has the only occurrence. For such PHI conditional >>>>>>>> scalar reduction is applied. >>>>>>>> This is correspondent to the following statement: >>>>>>>> if (q1 && q2 && q3) var++ >>>>>>>> New function phi_has_two_different_args was introduced to detect such >>>>>>>> phi. >>>>>>>> 3. Original algorithm for PHI predication used assumption that at >>>>>>>> least one incoming edge for blocks containing PHI is not critical - it >>>>>>>> guarantees that all computations related to predicate of normal edge >>>>>>>> are already inserted above this block and >>>>>>>> code related to PHI predication can be inserted at the beginning of >>>>>>>> block. But this is not true for critical edges for which predicate >>>>>>>> computations are in the block where code for phi predication must be >>>>>>>> inserted. So new function find_insertion_point is introduced which is >>>>>>>> simply found out the last statement in block defining predicates >>>>>>>> correspondent to all incoming edges and insert phi predication code >>>>>>>> after it (with some minor exceptions). >>>>>>> >>>>>>> Unfortunately the patch doesn't apply for me - I get >>>>>>> >>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@ >>>>>>> predicate_all_scalar_phis (struct loop *loop) >>>>>>> >>>>>>> a few remarks nevertheless. I don't see how we need both >>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi. >>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value >>>>>>> and handle equal values specially in predicate_extended_scalar_phi? >>>>>>> That would even make PHI <a, a, b, c, c> more optimal. >>>>>>> >>>>>>> I don't understand the need for find_insertion_point. All SSA names >>>>>>> required for the predicates are defined upward - and the complex CFG >>>>>>> is squashed to a single basic-block, thus the defs will dominate the >>>>>>> inserted code if you insert after labels just like for the other case. >>>>>>> Or what am I missing? ("flattening" of the basic-blocks of course needs >>>>>>> to happen in dominator order - but I guess that happens already?) >>>>>>> >>>>>>> I'd like the extended PHI handling to be enablable by a flag even >>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple >>>>>>> times that would have been nice to vectorize. I suggest to >>>>>>> add -ftree-loop-if-convert-aggressive for this. We can do this as >>>>>>> followup, but please rename the local flag_force_vectorize flag >>>>>>> to something less looking like a flag, like simply 'aggressive'. >>>>>>> >>>>>>> Otherwise patch 2 looks ok to me. >>>>>>> >>>>>>> Richard. >>>>>>> >>>>>>> >>>>>>>> ChangeLog: >>>>>>>> >>>>>>>> 2014-10-24 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>> >>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use >>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag. >>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if >>>>>>>> FLAG_FORCE_VECTORIZE is true. >>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one >>>>>>>> non-critical incoming edge. >>>>>>>> (phi_has_two_different_args): New function. >>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access >>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi >>>>>>>> arguments if EXTENDED is true. Change check that block >>>>>>>> containing reduction statement candidate is predecessor >>>>>>>> of phi-block since phi may have more than two arguments. >>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert >>>>>>>> statement before/after gsi point. >>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended >>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument >>>>>>>> true (which correspondent to argument BEFORE) to call of >>>>>>>> convert_scalar_cond_reduction. >>>>>>>> (get_predicate_for_edge): New function. >>>>>>>> (predicate_arbitrary_scalar_phi): New function. >>>>>>>> (predicate_extended_scalar_phi): New function. >>>>>>>> (find_insertion_point): New function. >>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and >>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more >>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke >>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or >>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on >>>>>>>> EXTENDED value. >>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block >>>>>>>> may have statements to insert. Insert predicate of BB just after label >>>>>>>> if FLAG_FORCE_VECTORIZE is true. >>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which >>>>>>>> is copy of inner or outer loop field force_vectorize.
#define N 512 #define max(x,y) (x) >= (y)? (x) : (y) #define min(x,y) (x) <= (y)? (x) : (y) int c_X[N]; int x_max; int x_min; extern int nx; void foo (int n) { int i, x; int xmin, xmax; int xmin_edge, xmax_edge; x = c_X[0]; xmin = xmax = x; xmin_edge = xmax_edge = 1; #pragma omp simd safelen(8) for (i = 1; i<n; i++) { x = c_X[i]; x = max(min(x,nx),1); if (x == xmin) { xmin_edge++; } if (x == xmax) { xmax_edge++; } else if (x < xmin) { xmin = x; xmin_edge = 1; } else if (x > xmax) { xmax = x; xmax_edge = 1; } } x_max = xmax_edge; x_min = xmin_edge; }
if-conv.patch2.2.1
Description: Binary data