Hi Richard,

Here is updated patch which includes
(1) split critical edges for aggressive if conversion.
(2) delete all stuff related to support of critical edge predication.
(3) only one function - predicate_scalar_phi performs predication.
(4) function find_phi_replacement_condition was deleted since it was
included in predicate_scalar_phi for phi with two arguments.

I checked that patch works in stress testing mode, i.e. with
aggressive if conversion by default.

What is your opinion?

Thanks.
Yuri.

2014-12-11 11:59 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
> On Wed, Dec 10, 2014 at 4:22 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>> Richard,
>>
>> Thanks for your reply!
>>
>> I didn't understand your point:
>>
>> Well, I don't mind splitting all critical edges unconditionally
>>
>> but you do it unconditionally in proposed patch.
>
> I don't mind means I am fine with it.
>
>> Also I assume that
>> call of split_critical_edges() can break ssa. For example, we can
>> split headers of loops, loop exit blocks etc.
>
> How does that "break SSA"?  You mean loop-closed SSA?  I'd
> be surprised if so but that may be possible.
>
>> I prefer to do something
>> more loop-specialized, e.g. call edge_split() for critical edges
>> outgoing from bb ending with GIMPLE_COND stmt (assuming that edge
>> destination bb belongs to loop).
>
> That works for me as well but it is more complicated to implement.
> Ideally you'd only split one edge if you find a block with only critical
> predecessors (where we'd currently give up).  But note that this
> requires re-computation of ifc_bbs in if_convertible_loop_p_1 and it
> will change loop->num_nodes so we have to be more careful in
> constructing the loop calling if_convertible_bb_p.
>
> Richard.
>
>>
>> 2014-12-10 17:31 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>> On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>> wrote:
>>>> Richard,
>>>>
>>>> Sorry that I forgot to delete debug dump from my fix.
>>>> I have few questions about your comments.
>>>>
>>>> 1. You wrote :
>>>>> You also still have two functions for PHI predication.  And the
>>>>> new extended variant doesn't commonize the 2-args and general
>>>>> path
>>>>  Did you mean that I must combine predicate_scalar_phi and
>>>> predicate_extended scalar phi to one function?
>>>> Please note that if additional flag was not set up (i.e.
>>>> aggressive_if_conv is false) extended predication is required more
>>>> compile time since it builds hash_map.
>>>
>>> It's compile-time complexity is reasonable enough even for
>>> non-aggressive if-conversion.
>>>
>>>> 2. About critical edge splitting.
>>>>
>>>> Did you mean that we should perform it (1) under aggressive_if_conv
>>>> option only; (2) should we split all critical edges.
>>>> Note that this leads to recomputing of topological order.
>>>
>>> Well, I don't mind splitting all critical edges unconditionally, thus
>>> do something like
>>>
>>> Index: gcc/tree-if-conv.c
>>> ===================================================================
>>> --- gcc/tree-if-conv.c  (revision 218515)
>>> +++ gcc/tree-if-conv.c  (working copy)
>>> @@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f
>>>    if (number_of_loops (fun) <= 1)
>>>      return 0;
>>>
>>> +  bool critical_edges_split_p = false;
>>>    FOR_EACH_LOOP (loop, 0)
>>>      if (flag_tree_loop_if_convert == 1
>>>         || flag_tree_loop_if_convert_stores == 1
>>>         || ((flag_tree_loop_vectorize || loop->force_vectorize)
>>>             && !loop->dont_vectorize))
>>> -      todo |= tree_if_conversion (loop);
>>> +      {
>>> +       if (!critical_edges_split_p)
>>> +         {
>>> +           split_critical_edges ();
>>> +           critical_edges_split_p = true;
>>> +           todo |= TODO_cleanup_cfg;
>>> +         }
>>> +       todo |= tree_if_conversion (loop);
>>> +      }
>>>
>>>  #ifdef ENABLE_CHECKING
>>>    {
>>>
>>>> It is worth noting that in current implementation bb's with 2
>>>> predecessors and both are on critical edges are accepted without
>>>> additional option.
>>>
>>> Yes, I know.
>>>
>>> tree-if-conv.c is a mess right now and if we can avoid adding more
>>> to it and even fix the critical edge missed optimization with splitting
>>> critical edges then I am all for that solution.
>>>
>>> Richard.
>>>
>>>> Thanks ahead.
>>>> Yuri.
>>>> 2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>>>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>> wrote:
>>>>>> Richard,
>>>>>>
>>>>>> Here is updated patch2 with the following changes:
>>>>>> 1. Delete functions  phi_has_two_different_args and find_insertion_point.
>>>>>> 2. Use only one function for extended predication -
>>>>>> predicate_extended_scalar_phi.
>>>>>> 3. Save gsi before insertion of predicate computations for basic
>>>>>> blocks if it has 2 predecessors and
>>>>>> both incoming edges are critical or it gas more than 2 predecessors
>>>>>> and at least one incoming edge
>>>>>> is critical. This saved iterator can be used by extended phi predication.
>>>>>>
>>>>>> Here is motivated test-case which explains this point.
>>>>>> Test-case is attached (t5.c) and it must be compiled with -O2
>>>>>> -ftree-loop-vectorize -fopenmp options.
>>>>>> The problem phi is in bb-7:
>>>>>>
>>>>>>   bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
>>>>>>   {
>>>>>>     <bb 5>:
>>>>>>     xmax_edge_18 = xmax_edge_36 + 1;
>>>>>>     if (xmax_17 == xmax_27)
>>>>>>       goto <bb 7>;
>>>>>>     else
>>>>>>       goto <bb 9>;
>>>>>>
>>>>>>   }
>>>>>>   bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
>>>>>>   {
>>>>>>     <bb 6>:
>>>>>>     if (xmax_17 == xmax_27)
>>>>>>       goto <bb 7>;
>>>>>>     else
>>>>>>       goto <bb 8>;
>>>>>>
>>>>>>   }
>>>>>>   bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
>>>>>>   {
>>>>>>     <bb 7>:
>>>>>>     # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>     xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>     goto <bb 11>;
>>>>>>
>>>>>>   }
>>>>>>
>>>>>> Note that both incoming edges to bb_7 are critical. If we comment out
>>>>>> restoring gsi in predicate_all_scalar_phi:
>>>>>> #if 0
>>>>>>  if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
>>>>>>      || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
>>>>>>    gsi = bb_insert_point (bb);
>>>>>>  else
>>>>>> #endif
>>>>>>    gsi = gsi_after_labels (bb);
>>>>>>
>>>>>> we will get ICE:
>>>>>> t5.c: In function 'foo':
>>>>>> t5.c:9:6: error: definition in block 4 follows the use
>>>>>>  void foo (int n)
>>>>>>       ^
>>>>>> for SSA_NAME: _1 in statement:
>>>>>> _52 = _1 & _3;
>>>>>> t5.c:9:6: internal compiler error: verify_ssa failed
>>>>>>
>>>>>> smce predicate computations were inserted in bb_7.
>>>>>
>>>>> The issue is obviously that the predicates have already been emitted
>>>>> in the target BB - that's of course the wrong place.  This is done
>>>>> by insert_gimplified_predicates.
>>>>>
>>>>> This just shows how edge predicate handling is broken - we don't
>>>>> seem to have a sequence of gimplified stmts for edge predicates
>>>>> but push those to e->dest which makes this really messy.
>>>>>
>>>>> Rather than having a separate phase where we insert all
>>>>> gimplified bb predicates we should do that on-demand when
>>>>> predicating a PHI.
>>>>>
>>>>> Your patch writes to stderr - that's bad - use dump_file and guard
>>>>> the printfs properly.
>>>>>
>>>>> You also still have two functions for PHI predication.  And the
>>>>> new extended variant doesn't commonize the 2-args and general
>>>>> paths.
>>>>>
>>>>> I'm not at all happy with this code.  It may be existing if-conv codes
>>>>> fault but making it even worse is not an option.
>>>>>
>>>>> Again - what's wrong with simply splitting critical edges if
>>>>> aggressive_if_conv?  I think that would very much simplify
>>>>> things here.  Or alternatively use gsi_insert_on_edge and
>>>>> commit edge insertions before merging the blocks.
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>>> ChangeLog is
>>>>>>
>>>>>> 2014-12-09  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>>
>>>>>> * tree-if-conv.c : Include hash-map.h.
>>>>>> (struct bb_predicate_s): Add new field to save copy of gimple
>>>>>> statement iterator.
>>>>>> (bb_insert_point): New function.
>>>>>> (set_bb_insert_point): New function.
>>>>>> (has_pred_critical_p): New function.
>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>> AGGRESSIVE_IF_CONV is true.
>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>> non-critical incoming edge.
>>>>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
>>>>>> Allow interchange PHI arguments if EXTENDED is false.
>>>>>> Change check that block containing reduction statement candidate
>>>>>> is predecessor of phi-block since phi may have more than two arguments.
>>>>>> (predicate_scalar_phi): Add new arguments for call of
>>>>>> is_cond_scalar_reduction.
>>>>>> (get_predicate_for_edge): New function.
>>>>>> (struct phi_args_hash_traits): New type.
>>>>>> (phi_args_hash_traits::hash): New function.
>>>>>> (phi_args_hash_traits::equal_keys): New function.
>>>>>> (gen_phi_arg_condition): New function.
>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
>>>>>> to true if BB containing phi has more than 2 predecessors or both
>>>>>> incoming edges are critical. Invoke find_phi_replacement_condition and
>>>>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
>>>>>> has 2 predecessors and both incoming edges are critical or it has more
>>>>>> than 2 predecessors and atleast one incoming edge is critical.
>>>>>> Use standard gsi_after_labels otherwise.
>>>>>> Invoke predicate_extended_scalar_phi if EXTENDED is true.
>>>>>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
>>>>>> to save gsi before insertion of predicate computations. SEt-up it to
>>>>>> true for BB with 2 predecessors and critical incoming edges either
>>>>>>         number of predecessors is geater 2 and at least one incoming 
>>>>>> edge is
>>>>>> critical.
>>>>>> Add check that non-predicated block may have statements to insert.
>>>>>> Insert predicate computation of BB just after label if
>>>>>> EXTENDED_PREDICATION is true.
>>>>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
>>>>>> is copy of inner or outer loop force_vectorize field.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>>>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>>>> wrote:
>>>>>>>> Richard,
>>>>>>>>
>>>>>>>> I did simple change by saving gsi iterator for each bb that has
>>>>>>>> critical edges by adding additional field to bb_predicate_s:
>>>>>>>>
>>>>>>>> typedef struct bb_predicate_s {
>>>>>>>>
>>>>>>>>   /* The condition under which this basic block is executed.  */
>>>>>>>>   tree predicate;
>>>>>>>>
>>>>>>>>   /* PREDICATE is gimplified, and the sequence of statements is
>>>>>>>>      recorded here, in order to avoid the duplication of computations
>>>>>>>>      that occur in previous conditions.  See PR44483.  */
>>>>>>>>   gimple_seq predicate_gimplified_stmts;
>>>>>>>>
>>>>>>>>   /* Insertion point for blocks having incoming critical edges.  */
>>>>>>>>   gimple_stmt_iterator gsi;
>>>>>>>> } *bb_predicate_p;
>>>>>>>>
>>>>>>>> and this iterator is saved in  insert_gimplified_predicates before
>>>>>>>> insertion code for predicate computation. I checked that this fix
>>>>>>>> works.
>>>>>>>
>>>>>>> Huh?  I still wonder what the issue is with inserting everything
>>>>>>> after the PHI we predicate.
>>>>>>>
>>>>>>> Well, your updated patch will come with testcases for the testsuite
>>>>>>> that will hopefully fail if doing that.
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>>
>>>>>>>> Now I am implementing merging of predicate_extended.. and
>>>>>>>> predicate_arbitrary.. functions as you proposed.
>>>>>>>>
>>>>>>>> Best regards.
>>>>>>>> Yuri.
>>>>>>>>
>>>>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>>>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>>>>>> wrote:
>>>>>>>>>> Thanks Richard for your quick reply!
>>>>>>>>>>
>>>>>>>>>> 1. I agree that we can combine predicate_extended_ and
>>>>>>>>>> predicate_arbitrary_ to one function as you proposed.
>>>>>>>>>> 2. What is your opinion about using more simple decision about
>>>>>>>>>> insertion point - if bb has use of phi result insert phi predication
>>>>>>>>>> before it and at the bb end otherwise. I assume that critical edge
>>>>>>>>>> splitting is not a good decision.
>>>>>>>>>
>>>>>>>>> Why not always insert before the use?  Which would be after labels,
>>>>>>>>> what we do for two-arg PHIs.  That is, how can it be that you 
>>>>>>>>> predicate
>>>>>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming
>>>>>>>>> edges you get SSA uses with defs that are in BB1 itself?  That
>>>>>>>>> can only happen for backedges but those you can't remove in any case.
>>>>>>>>>
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Best regards.
>>>>>>>>>> Yuri.
>>>>>>>>>>
>>>>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener 
>>>>>>>>>> <richard.guent...@gmail.com>:
>>>>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev 
>>>>>>>>>>> <ysrum...@gmail.com> wrote:
>>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>>
>>>>>>>>>>>> I resend you patch1 and patch2 with minor changes:
>>>>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>>>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>>>>>>>>>> I also very sorry that I sent you bad patch.
>>>>>>>>>>>>
>>>>>>>>>>>> Now let me answer on your questions related to second patch.
>>>>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>>>>>>>>> predicate_arbitrary_scalar_phi?
>>>>>>>>>>>>
>>>>>>>>>>>> Let's consider the following simple test-case:
>>>>>>>>>>>>
>>>>>>>>>>>>   #pragma omp simd safelen(8)
>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>   {
>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>>>>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>   }
>>>>>>>>>>>>
>>>>>>>>>>>> we can see the following phi node correspondent to res:
>>>>>>>>>>>>
>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>>>>>>>>
>>>>>>>>>>>> It is clear that we can optimize it to phi node with 2 arguments 
>>>>>>>>>>>> only
>>>>>>>>>>>> and only one check can be used for phi predication (for reduction 
>>>>>>>>>>>> in
>>>>>>>>>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>>>>>>>>>> even if we sort all phi argument values since we still have to 
>>>>>>>>>>>> produce
>>>>>>>>>>>> a chain of cond expressions to perform phi predication (see 
>>>>>>>>>>>> comments
>>>>>>>>>>>> for predicate_arbitrary_scalar_phi).
>>>>>>>>>>>
>>>>>>>>>>> How so?  We can always use !(condition) for the "last" value, thus
>>>>>>>>>>> treat it as an 'else' case.  That even works for
>>>>>>>>>>>
>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>>>>>>>>
>>>>>>>>>>> where the condition for edges 5 and 7 can be computed as
>>>>>>>>>>> ! (condition for 3 || condition for 4).
>>>>>>>>>>>
>>>>>>>>>>> Of course it is worthwhile to also sort single-occurances first
>>>>>>>>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>>>>>>>>> used for edges 3 and 4 combined.
>>>>>>>>>>>
>>>>>>>>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>>>>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>>>>>>>>>> only critical incoming edges and both contain code computing edge
>>>>>>>>>>>> predicates, e.g.
>>>>>>>>>>>>
>>>>>>>>>>>> <bb 7>:
>>>>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>>>> _46 = xmax_17 == xmax_37;
>>>>>>>>>>>> _47 = xmax_17 == xmax_27;
>>>>>>>>>>>> _48 = _46 & _47;
>>>>>>>>>>>> _53 = xmax_17 == xmax_37;
>>>>>>>>>>>> _54 = ~_53;
>>>>>>>>>>>> _55 = xmax_17 == xmax_27;
>>>>>>>>>>>> _56 = _54 & _55;
>>>>>>>>>>>> _57 = _48 | _56;
>>>>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>>>> goto <bb 11>;
>>>>>>>>>>>>
>>>>>>>>>>>> It is evident that we can not put phi predication at the block
>>>>>>>>>>>> beginning but need to put it after predicate computations.
>>>>>>>>>>>> Note also that if there are no critical edges for phi arguments
>>>>>>>>>>>> insertion point will be "after labels" Note also that phi result 
>>>>>>>>>>>> can
>>>>>>>>>>>> have use in this block too, so we can't put predication code to the
>>>>>>>>>>>> block end.
>>>>>>>>>>>
>>>>>>>>>>> So the issue is that predicate insertion for edge predicates does
>>>>>>>>>>> not happen on the edge but somewhere else (generally impossible
>>>>>>>>>>> for critical edges unless you split them).
>>>>>>>>>>>
>>>>>>>>>>> I think I've told you before that I prefer simple solutions to such 
>>>>>>>>>>> issues,
>>>>>>>>>>> like splitting the edge!  Certainly not involving a function walking
>>>>>>>>>>> GENERIC expressions.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Richard.
>>>>>>>>>>>
>>>>>>>>>>>> Let me know if you still have any questions.
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards.
>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener 
>>>>>>>>>>>> <richard.guent...@gmail.com>:
>>>>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev 
>>>>>>>>>>>>> <ysrum...@gmail.com> wrote:
>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here is the second patch related to extended predication.
>>>>>>>>>>>>>> Few comments which explain a main goal of design.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1. I don't want to insert any critical edge splitting since it 
>>>>>>>>>>>>>> may
>>>>>>>>>>>>>> lead to less efficient binaries.
>>>>>>>>>>>>>> 2. One special case of extended PHI node predication was 
>>>>>>>>>>>>>> introduced
>>>>>>>>>>>>>> when #arguments is more than 2 but only two arguments are 
>>>>>>>>>>>>>> different
>>>>>>>>>>>>>> and one argument has the only occurrence. For such PHI 
>>>>>>>>>>>>>> conditional
>>>>>>>>>>>>>> scalar reduction is applied.
>>>>>>>>>>>>>> This is correspondent to the following statement:
>>>>>>>>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>>>>>>>>  New function phi_has_two_different_args was introduced to 
>>>>>>>>>>>>>> detect such phi.
>>>>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>>>>>>>>>> least one incoming edge for blocks containing PHI is not 
>>>>>>>>>>>>>> critical - it
>>>>>>>>>>>>>> guarantees that all computations related to predicate of normal 
>>>>>>>>>>>>>> edge
>>>>>>>>>>>>>> are already inserted above this block and
>>>>>>>>>>>>>> code related to PHI predication can be inserted at the beginning 
>>>>>>>>>>>>>> of
>>>>>>>>>>>>>> block. But this is not true for critical edges for which 
>>>>>>>>>>>>>> predicate
>>>>>>>>>>>>>> computations are  in the block where code for phi predication 
>>>>>>>>>>>>>> must be
>>>>>>>>>>>>>> inserted. So new function find_insertion_point is introduced 
>>>>>>>>>>>>>> which is
>>>>>>>>>>>>>> simply found out the last statement in block defining predicates
>>>>>>>>>>>>>> correspondent to all incoming edges and insert phi predication 
>>>>>>>>>>>>>> code
>>>>>>>>>>>>>> after it (with some minor exceptions).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>>>>>>>>
>>>>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>>>>>>>>
>>>>>>>>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after 
>>>>>>>>>>>>> value
>>>>>>>>>>>>> and handle equal values specially in 
>>>>>>>>>>>>> predicate_extended_scalar_phi?
>>>>>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I don't understand the need for find_insertion_point.  All SSA 
>>>>>>>>>>>>> names
>>>>>>>>>>>>> required for the predicates are defined upward - and the complex 
>>>>>>>>>>>>> CFG
>>>>>>>>>>>>> is squashed to a single basic-block, thus the defs will dominate 
>>>>>>>>>>>>> the
>>>>>>>>>>>>> inserted code if you insert after labels just like for the other 
>>>>>>>>>>>>> case.
>>>>>>>>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of 
>>>>>>>>>>>>> course needs
>>>>>>>>>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args 
>>>>>>>>>>>>> multiple
>>>>>>>>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>>>>>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>>>>>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose 
>>>>>>>>>>>>>> access
>>>>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>>>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>>>>>>>>> containing reduction statement candidate is predecessor
>>>>>>>>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>>>>>>>>> statement before/after gsi point.
>>>>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means 
>>>>>>>>>>>>>> non-extended
>>>>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>>>>>>>>> convert_scalar_cond_reduction.
>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED 
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending 
>>>>>>>>>>>>>> on
>>>>>>>>>>>>>> EXTENDED value.
>>>>>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated 
>>>>>>>>>>>>>> block
>>>>>>>>>>>>>> may have statements to insert. Insert predicate of BB just after 
>>>>>>>>>>>>>> label
>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE 
>>>>>>>>>>>>>> which
>>>>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.

Attachment: if-conv.patch2.3
Description: Binary data

Reply via email to