On Tue, Dec 16, 2014 at 4:15 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
> Hi Richard,
>
> Here is updated patch which includes
> (1) split critical edges for aggressive if conversion.
> (2) delete all stuff related to support of critical edge predication.
> (3) only one function - predicate_scalar_phi performs predication.
> (4) function find_phi_replacement_condition was deleted since it was
> included in predicate_scalar_phi for phi with two arguments.
>
> I checked that patch works in stress testing mode, i.e. with
> aggressive if conversion by default.
>
> What is your opinion?

Looks ok overall, but please simply do

  FOR_EACH_EDGE (e, ei, bb->succs)
    if (EDGE_CRITICAL_P (e) && e->dest->loop_father == loop)
      split_edge (e);

for all blocks apart from the latch.

Can you please send a combined patch up to this one?  Looking at
the incremental diff is somewhat hard.  Thus a patch including all
patches from patch1 to this one.

Thanks,
Richard.

>
> Thanks.
> Yuri.
>
> 2014-12-11 11:59 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>> On Wed, Dec 10, 2014 at 4:22 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>>> Richard,
>>>
>>> Thanks for your reply!
>>>
>>> I didn't understand your point:
>>>
>>> Well, I don't mind splitting all critical edges unconditionally
>>>
>>> but you do it unconditionally in proposed patch.
>>
>> I don't mind means I am fine with it.
>>
>>> Also I assume that
>>> call of split_critical_edges() can break ssa. For example, we can
>>> split headers of loops, loop exit blocks etc.
>>
>> How does that "break SSA"?  You mean loop-closed SSA?  I'd
>> be surprised if so but that may be possible.
>>
>>> I prefer to do something
>>> more loop-specialized, e.g. call edge_split() for critical edges
>>> outgoing from bb ending with GIMPLE_COND stmt (assuming that edge
>>> destination bb belongs to loop).
>>
>> That works for me as well but it is more complicated to implement.
>> Ideally you'd only split one edge if you find a block with only critical
>> predecessors (where we'd currently give up).  But note that this
>> requires re-computation of ifc_bbs in if_convertible_loop_p_1 and it
>> will change loop->num_nodes so we have to be more careful in
>> constructing the loop calling if_convertible_bb_p.
>>
>> Richard.
>>
>>>
>>> 2014-12-10 17:31 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>>> On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>> wrote:
>>>>> Richard,
>>>>>
>>>>> Sorry that I forgot to delete debug dump from my fix.
>>>>> I have few questions about your comments.
>>>>>
>>>>> 1. You wrote :
>>>>>> You also still have two functions for PHI predication.  And the
>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>> path
>>>>>  Did you mean that I must combine predicate_scalar_phi and
>>>>> predicate_extended scalar phi to one function?
>>>>> Please note that if additional flag was not set up (i.e.
>>>>> aggressive_if_conv is false) extended predication is required more
>>>>> compile time since it builds hash_map.
>>>>
>>>> It's compile-time complexity is reasonable enough even for
>>>> non-aggressive if-conversion.
>>>>
>>>>> 2. About critical edge splitting.
>>>>>
>>>>> Did you mean that we should perform it (1) under aggressive_if_conv
>>>>> option only; (2) should we split all critical edges.
>>>>> Note that this leads to recomputing of topological order.
>>>>
>>>> Well, I don't mind splitting all critical edges unconditionally, thus
>>>> do something like
>>>>
>>>> Index: gcc/tree-if-conv.c
>>>> ===================================================================
>>>> --- gcc/tree-if-conv.c  (revision 218515)
>>>> +++ gcc/tree-if-conv.c  (working copy)
>>>> @@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f
>>>>    if (number_of_loops (fun) <= 1)
>>>>      return 0;
>>>>
>>>> +  bool critical_edges_split_p = false;
>>>>    FOR_EACH_LOOP (loop, 0)
>>>>      if (flag_tree_loop_if_convert == 1
>>>>         || flag_tree_loop_if_convert_stores == 1
>>>>         || ((flag_tree_loop_vectorize || loop->force_vectorize)
>>>>             && !loop->dont_vectorize))
>>>> -      todo |= tree_if_conversion (loop);
>>>> +      {
>>>> +       if (!critical_edges_split_p)
>>>> +         {
>>>> +           split_critical_edges ();
>>>> +           critical_edges_split_p = true;
>>>> +           todo |= TODO_cleanup_cfg;
>>>> +         }
>>>> +       todo |= tree_if_conversion (loop);
>>>> +      }
>>>>
>>>>  #ifdef ENABLE_CHECKING
>>>>    {
>>>>
>>>>> It is worth noting that in current implementation bb's with 2
>>>>> predecessors and both are on critical edges are accepted without
>>>>> additional option.
>>>>
>>>> Yes, I know.
>>>>
>>>> tree-if-conv.c is a mess right now and if we can avoid adding more
>>>> to it and even fix the critical edge missed optimization with splitting
>>>> critical edges then I am all for that solution.
>>>>
>>>> Richard.
>>>>
>>>>> Thanks ahead.
>>>>> Yuri.
>>>>> 2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>>>>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>>> wrote:
>>>>>>> Richard,
>>>>>>>
>>>>>>> Here is updated patch2 with the following changes:
>>>>>>> 1. Delete functions  phi_has_two_different_args and 
>>>>>>> find_insertion_point.
>>>>>>> 2. Use only one function for extended predication -
>>>>>>> predicate_extended_scalar_phi.
>>>>>>> 3. Save gsi before insertion of predicate computations for basic
>>>>>>> blocks if it has 2 predecessors and
>>>>>>> both incoming edges are critical or it gas more than 2 predecessors
>>>>>>> and at least one incoming edge
>>>>>>> is critical. This saved iterator can be used by extended phi 
>>>>>>> predication.
>>>>>>>
>>>>>>> Here is motivated test-case which explains this point.
>>>>>>> Test-case is attached (t5.c) and it must be compiled with -O2
>>>>>>> -ftree-loop-vectorize -fopenmp options.
>>>>>>> The problem phi is in bb-7:
>>>>>>>
>>>>>>>   bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
>>>>>>>   {
>>>>>>>     <bb 5>:
>>>>>>>     xmax_edge_18 = xmax_edge_36 + 1;
>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>       goto <bb 7>;
>>>>>>>     else
>>>>>>>       goto <bb 9>;
>>>>>>>
>>>>>>>   }
>>>>>>>   bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
>>>>>>>   {
>>>>>>>     <bb 6>:
>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>       goto <bb 7>;
>>>>>>>     else
>>>>>>>       goto <bb 8>;
>>>>>>>
>>>>>>>   }
>>>>>>>   bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
>>>>>>>   {
>>>>>>>     <bb 7>:
>>>>>>>     # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>     xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>     goto <bb 11>;
>>>>>>>
>>>>>>>   }
>>>>>>>
>>>>>>> Note that both incoming edges to bb_7 are critical. If we comment out
>>>>>>> restoring gsi in predicate_all_scalar_phi:
>>>>>>> #if 0
>>>>>>>  if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
>>>>>>>      || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
>>>>>>>    gsi = bb_insert_point (bb);
>>>>>>>  else
>>>>>>> #endif
>>>>>>>    gsi = gsi_after_labels (bb);
>>>>>>>
>>>>>>> we will get ICE:
>>>>>>> t5.c: In function 'foo':
>>>>>>> t5.c:9:6: error: definition in block 4 follows the use
>>>>>>>  void foo (int n)
>>>>>>>       ^
>>>>>>> for SSA_NAME: _1 in statement:
>>>>>>> _52 = _1 & _3;
>>>>>>> t5.c:9:6: internal compiler error: verify_ssa failed
>>>>>>>
>>>>>>> smce predicate computations were inserted in bb_7.
>>>>>>
>>>>>> The issue is obviously that the predicates have already been emitted
>>>>>> in the target BB - that's of course the wrong place.  This is done
>>>>>> by insert_gimplified_predicates.
>>>>>>
>>>>>> This just shows how edge predicate handling is broken - we don't
>>>>>> seem to have a sequence of gimplified stmts for edge predicates
>>>>>> but push those to e->dest which makes this really messy.
>>>>>>
>>>>>> Rather than having a separate phase where we insert all
>>>>>> gimplified bb predicates we should do that on-demand when
>>>>>> predicating a PHI.
>>>>>>
>>>>>> Your patch writes to stderr - that's bad - use dump_file and guard
>>>>>> the printfs properly.
>>>>>>
>>>>>> You also still have two functions for PHI predication.  And the
>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>> paths.
>>>>>>
>>>>>> I'm not at all happy with this code.  It may be existing if-conv codes
>>>>>> fault but making it even worse is not an option.
>>>>>>
>>>>>> Again - what's wrong with simply splitting critical edges if
>>>>>> aggressive_if_conv?  I think that would very much simplify
>>>>>> things here.  Or alternatively use gsi_insert_on_edge and
>>>>>> commit edge insertions before merging the blocks.
>>>>>>
>>>>>> Thanks,
>>>>>> Richard.
>>>>>>
>>>>>>> ChangeLog is
>>>>>>>
>>>>>>> 2014-12-09  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>>>
>>>>>>> * tree-if-conv.c : Include hash-map.h.
>>>>>>> (struct bb_predicate_s): Add new field to save copy of gimple
>>>>>>> statement iterator.
>>>>>>> (bb_insert_point): New function.
>>>>>>> (set_bb_insert_point): New function.
>>>>>>> (has_pred_critical_p): New function.
>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>> AGGRESSIVE_IF_CONV is true.
>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>> non-critical incoming edge.
>>>>>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
>>>>>>> Allow interchange PHI arguments if EXTENDED is false.
>>>>>>> Change check that block containing reduction statement candidate
>>>>>>> is predecessor of phi-block since phi may have more than two arguments.
>>>>>>> (predicate_scalar_phi): Add new arguments for call of
>>>>>>> is_cond_scalar_reduction.
>>>>>>> (get_predicate_for_edge): New function.
>>>>>>> (struct phi_args_hash_traits): New type.
>>>>>>> (phi_args_hash_traits::hash): New function.
>>>>>>> (phi_args_hash_traits::equal_keys): New function.
>>>>>>> (gen_phi_arg_condition): New function.
>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
>>>>>>> to true if BB containing phi has more than 2 predecessors or both
>>>>>>> incoming edges are critical. Invoke find_phi_replacement_condition and
>>>>>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
>>>>>>> has 2 predecessors and both incoming edges are critical or it has more
>>>>>>> than 2 predecessors and atleast one incoming edge is critical.
>>>>>>> Use standard gsi_after_labels otherwise.
>>>>>>> Invoke predicate_extended_scalar_phi if EXTENDED is true.
>>>>>>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
>>>>>>> to save gsi before insertion of predicate computations. SEt-up it to
>>>>>>> true for BB with 2 predecessors and critical incoming edges either
>>>>>>>         number of predecessors is geater 2 and at least one incoming 
>>>>>>> edge is
>>>>>>> critical.
>>>>>>> Add check that non-predicated block may have statements to insert.
>>>>>>> Insert predicate computation of BB just after label if
>>>>>>> EXTENDED_PREDICATION is true.
>>>>>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
>>>>>>> is copy of inner or outer loop force_vectorize field.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>>>>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>>>>> wrote:
>>>>>>>>> Richard,
>>>>>>>>>
>>>>>>>>> I did simple change by saving gsi iterator for each bb that has
>>>>>>>>> critical edges by adding additional field to bb_predicate_s:
>>>>>>>>>
>>>>>>>>> typedef struct bb_predicate_s {
>>>>>>>>>
>>>>>>>>>   /* The condition under which this basic block is executed.  */
>>>>>>>>>   tree predicate;
>>>>>>>>>
>>>>>>>>>   /* PREDICATE is gimplified, and the sequence of statements is
>>>>>>>>>      recorded here, in order to avoid the duplication of computations
>>>>>>>>>      that occur in previous conditions.  See PR44483.  */
>>>>>>>>>   gimple_seq predicate_gimplified_stmts;
>>>>>>>>>
>>>>>>>>>   /* Insertion point for blocks having incoming critical edges.  */
>>>>>>>>>   gimple_stmt_iterator gsi;
>>>>>>>>> } *bb_predicate_p;
>>>>>>>>>
>>>>>>>>> and this iterator is saved in  insert_gimplified_predicates before
>>>>>>>>> insertion code for predicate computation. I checked that this fix
>>>>>>>>> works.
>>>>>>>>
>>>>>>>> Huh?  I still wonder what the issue is with inserting everything
>>>>>>>> after the PHI we predicate.
>>>>>>>>
>>>>>>>> Well, your updated patch will come with testcases for the testsuite
>>>>>>>> that will hopefully fail if doing that.
>>>>>>>>
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Now I am implementing merging of predicate_extended.. and
>>>>>>>>> predicate_arbitrary.. functions as you proposed.
>>>>>>>>>
>>>>>>>>> Best regards.
>>>>>>>>> Yuri.
>>>>>>>>>
>>>>>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener 
>>>>>>>>> <richard.guent...@gmail.com>:
>>>>>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>>>>>>> wrote:
>>>>>>>>>>> Thanks Richard for your quick reply!
>>>>>>>>>>>
>>>>>>>>>>> 1. I agree that we can combine predicate_extended_ and
>>>>>>>>>>> predicate_arbitrary_ to one function as you proposed.
>>>>>>>>>>> 2. What is your opinion about using more simple decision about
>>>>>>>>>>> insertion point - if bb has use of phi result insert phi predication
>>>>>>>>>>> before it and at the bb end otherwise. I assume that critical edge
>>>>>>>>>>> splitting is not a good decision.
>>>>>>>>>>
>>>>>>>>>> Why not always insert before the use?  Which would be after labels,
>>>>>>>>>> what we do for two-arg PHIs.  That is, how can it be that you 
>>>>>>>>>> predicate
>>>>>>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming
>>>>>>>>>> edges you get SSA uses with defs that are in BB1 itself?  That
>>>>>>>>>> can only happen for backedges but those you can't remove in any case.
>>>>>>>>>>
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Best regards.
>>>>>>>>>>> Yuri.
>>>>>>>>>>>
>>>>>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener 
>>>>>>>>>>> <richard.guent...@gmail.com>:
>>>>>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev 
>>>>>>>>>>>> <ysrum...@gmail.com> wrote:
>>>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I resend you patch1 and patch2 with minor changes:
>>>>>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>>>>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>>>>>>>>>>> I also very sorry that I sent you bad patch.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Now let me answer on your questions related to second patch.
>>>>>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>>>>>>>>>> predicate_arbitrary_scalar_phi?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Let's consider the following simple test-case:
>>>>>>>>>>>>>
>>>>>>>>>>>>>   #pragma omp simd safelen(8)
>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>   {
>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>>>>>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>   }
>>>>>>>>>>>>>
>>>>>>>>>>>>> we can see the following phi node correspondent to res:
>>>>>>>>>>>>>
>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>>>>>>>>>
>>>>>>>>>>>>> It is clear that we can optimize it to phi node with 2 arguments 
>>>>>>>>>>>>> only
>>>>>>>>>>>>> and only one check can be used for phi predication (for reduction 
>>>>>>>>>>>>> in
>>>>>>>>>>>>> our case), namely predicate of bb_5. In general case we can't do 
>>>>>>>>>>>>> it
>>>>>>>>>>>>> even if we sort all phi argument values since we still have to 
>>>>>>>>>>>>> produce
>>>>>>>>>>>>> a chain of cond expressions to perform phi predication (see 
>>>>>>>>>>>>> comments
>>>>>>>>>>>>> for predicate_arbitrary_scalar_phi).
>>>>>>>>>>>>
>>>>>>>>>>>> How so?  We can always use !(condition) for the "last" value, thus
>>>>>>>>>>>> treat it as an 'else' case.  That even works for
>>>>>>>>>>>>
>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>>>>>>>>>
>>>>>>>>>>>> where the condition for edges 5 and 7 can be computed as
>>>>>>>>>>>> ! (condition for 3 || condition for 4).
>>>>>>>>>>>>
>>>>>>>>>>>> Of course it is worthwhile to also sort single-occurances first
>>>>>>>>>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>>>>>>>>>> used for edges 3 and 4 combined.
>>>>>>>>>>>>
>>>>>>>>>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>>>>>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>>>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes 
>>>>>>>>>>>>> has
>>>>>>>>>>>>> only critical incoming edges and both contain code computing edge
>>>>>>>>>>>>> predicates, e.g.
>>>>>>>>>>>>>
>>>>>>>>>>>>> <bb 7>:
>>>>>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>>>>> _46 = xmax_17 == xmax_37;
>>>>>>>>>>>>> _47 = xmax_17 == xmax_27;
>>>>>>>>>>>>> _48 = _46 & _47;
>>>>>>>>>>>>> _53 = xmax_17 == xmax_37;
>>>>>>>>>>>>> _54 = ~_53;
>>>>>>>>>>>>> _55 = xmax_17 == xmax_27;
>>>>>>>>>>>>> _56 = _54 & _55;
>>>>>>>>>>>>> _57 = _48 | _56;
>>>>>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>>>>> goto <bb 11>;
>>>>>>>>>>>>>
>>>>>>>>>>>>> It is evident that we can not put phi predication at the block
>>>>>>>>>>>>> beginning but need to put it after predicate computations.
>>>>>>>>>>>>> Note also that if there are no critical edges for phi arguments
>>>>>>>>>>>>> insertion point will be "after labels" Note also that phi result 
>>>>>>>>>>>>> can
>>>>>>>>>>>>> have use in this block too, so we can't put predication code to 
>>>>>>>>>>>>> the
>>>>>>>>>>>>> block end.
>>>>>>>>>>>>
>>>>>>>>>>>> So the issue is that predicate insertion for edge predicates does
>>>>>>>>>>>> not happen on the edge but somewhere else (generally impossible
>>>>>>>>>>>> for critical edges unless you split them).
>>>>>>>>>>>>
>>>>>>>>>>>> I think I've told you before that I prefer simple solutions to 
>>>>>>>>>>>> such issues,
>>>>>>>>>>>> like splitting the edge!  Certainly not involving a function 
>>>>>>>>>>>> walking
>>>>>>>>>>>> GENERIC expressions.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Richard.
>>>>>>>>>>>>
>>>>>>>>>>>>> Let me know if you still have any questions.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener 
>>>>>>>>>>>>> <richard.guent...@gmail.com>:
>>>>>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev 
>>>>>>>>>>>>>> <ysrum...@gmail.com> wrote:
>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Here is the second patch related to extended predication.
>>>>>>>>>>>>>>> Few comments which explain a main goal of design.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1. I don't want to insert any critical edge splitting since it 
>>>>>>>>>>>>>>> may
>>>>>>>>>>>>>>> lead to less efficient binaries.
>>>>>>>>>>>>>>> 2. One special case of extended PHI node predication was 
>>>>>>>>>>>>>>> introduced
>>>>>>>>>>>>>>> when #arguments is more than 2 but only two arguments are 
>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>> and one argument has the only occurrence. For such PHI 
>>>>>>>>>>>>>>> conditional
>>>>>>>>>>>>>>> scalar reduction is applied.
>>>>>>>>>>>>>>> This is correspondent to the following statement:
>>>>>>>>>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>>>>>>>>>  New function phi_has_two_different_args was introduced to 
>>>>>>>>>>>>>>> detect such phi.
>>>>>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption that 
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> least one incoming edge for blocks containing PHI is not 
>>>>>>>>>>>>>>> critical - it
>>>>>>>>>>>>>>> guarantees that all computations related to predicate of normal 
>>>>>>>>>>>>>>> edge
>>>>>>>>>>>>>>> are already inserted above this block and
>>>>>>>>>>>>>>> code related to PHI predication can be inserted at the 
>>>>>>>>>>>>>>> beginning of
>>>>>>>>>>>>>>> block. But this is not true for critical edges for which 
>>>>>>>>>>>>>>> predicate
>>>>>>>>>>>>>>> computations are  in the block where code for phi predication 
>>>>>>>>>>>>>>> must be
>>>>>>>>>>>>>>> inserted. So new function find_insertion_point is introduced 
>>>>>>>>>>>>>>> which is
>>>>>>>>>>>>>>> simply found out the last statement in block defining predicates
>>>>>>>>>>>>>>> correspondent to all incoming edges and insert phi predication 
>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>> after it (with some minor exceptions).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>>>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after 
>>>>>>>>>>>>>> value
>>>>>>>>>>>>>> and handle equal values specially in 
>>>>>>>>>>>>>> predicate_extended_scalar_phi?
>>>>>>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I don't understand the need for find_insertion_point.  All SSA 
>>>>>>>>>>>>>> names
>>>>>>>>>>>>>> required for the predicates are defined upward - and the complex 
>>>>>>>>>>>>>> CFG
>>>>>>>>>>>>>> is squashed to a single basic-block, thus the defs will dominate 
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> inserted code if you insert after labels just like for the other 
>>>>>>>>>>>>>> case.
>>>>>>>>>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of 
>>>>>>>>>>>>>> course needs
>>>>>>>>>>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args 
>>>>>>>>>>>>>> multiple
>>>>>>>>>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this 
>>>>>>>>>>>>>> as
>>>>>>>>>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>>>>>>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose 
>>>>>>>>>>>>>>> access
>>>>>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>>>>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>>>>>>>>>> containing reduction statement candidate is predecessor
>>>>>>>>>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>>>>>>>>>> statement before/after gsi point.
>>>>>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means 
>>>>>>>>>>>>>>> non-extended
>>>>>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>>>>>>>>>> convert_scalar_cond_reduction.
>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED 
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has 
>>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi 
>>>>>>>>>>>>>>> depending on
>>>>>>>>>>>>>>> EXTENDED value.
>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated 
>>>>>>>>>>>>>>> block
>>>>>>>>>>>>>>> may have statements to insert. Insert predicate of BB just 
>>>>>>>>>>>>>>> after label
>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>> (tree_if_conversion): Add initialization of 
>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE which
>>>>>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.

Reply via email to