On Mon, Jul 20, 2015 at 5:05 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
> Hi Jeff!
>
> Thanks for your details comments.
>
> You asked:
> How are conditionals on vectors usually handled?  You should try to
> mimick that and report, with detail, why that's not working.
>
> In current implementation of vectorization pass all comparisons are
> handled through COND_EXPR, i.e. vect-pattern pass transforms all
> comparisons producing bool values to conditional expressions like a[i]
> != 0  --> a[i]!=0? 1: 0 which vectorizers transforms to
> vect-cond-expr. So we don't have operations with vector operands and
> scalar (bool) result.
> To implement such operations I introduced target-hook. Could you
> propose another solution implementing it?

see below...

> Thanks.
> Yuri.
>
> 2015-07-10 8:51 GMT+03:00 Jeff Law <l...@redhat.com>:
>> On 06/18/2015 08:32 AM, Yuri Rumyantsev wrote:
>>>
>>> Richard,
>>>
>>> Here is updated patch which does not include your proposal related to
>>> the target hook deletion.
>>> You wrote:
>>>>
>>>> I still don't understand why you need the new target hook.  If we have a
>>>> masked
>>>> load/store then the mask is computed by an assignment with a
>>>> VEC_COND_EXPR
>>>> (in your example) and thus a test for a zero mask is expressible as just
>>>>
>>>   > if (vect__ifc__41.17_167 == { 0, 0, 0, 0... })
>>>>
>>>>
>>>> or am I missing something?
>>>
>>> Such vector compare produces vector and does not set up cc flags
>>> required for conditional branch (GIMPLE_COND).
>>> If we use such comparison for GIMPLE_COND we got error message, so I
>>> used target hook which does set up cc flags aka ptest instruction and
>>> I left this part.

... what error message did you get?

I suppose you ran into IL checking complaining about

error: vector comparison returning a boolean

?

It looks like we forbade equality compares for consistency thus you'd need
to use

  vec_ifc_tem_4 = VIEW_CONVERT_EXPR<TImode> (vect_ifc_41.17_167);
  if (vec_ifc_tem_4 == 0)
    ...

instead.   Thus view-convert to an integer mode of the same size as the vector.

Or we allow equality compares of vectors.  Not sure if expansion /
targets handle
them well though - you'd have to check.

Richard.

>> I think we need to look for something better.  I really don't like the idea
>> of using a target hook like this within the gimple optimizers unless it's
>> absolutely necessary.
>>
>> How are conditionals on vectors usually handled?  You should try to mimick
>> that and report, with detail, why that's not working.
>>
>> I'm also not happy with the mechanisms to determine whether or not we should
>> make this transformation.  I'm willing to hash through other details and
>> come back to this issue once we've got some of the other stuff figured out.
>> I guess a new flag with the target adjusting is the fallback if we can't
>> come up with some reasonable way to select this on or off.
>>
>> The reason I don't like having the target files make this kind of decision
>> is it makes more gimple transformations target dependent. Based on the
>> history with RTL, that ultimately leads to an unwieldy mess.
>>
>> And yes, I know gimple isn't 100% target independent -- but that doesn't
>> mean we should keep adding more target dependencies.  Each one we add needs
>> to be well vetted.
>>
>>
>> patch.3
>>
>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> index 44a8624..e90de32 100644
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -100,6 +100,8 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "tree-iterator.h"
>>  #include "tree-chkp.h"
>>  #include "rtl-chkp.h"
>> +#include "stringpool.h"
>> +#include "tree-ssanames.h"
>> So ideally we'll figure out why you're unable to generate a suitable
>> conditional at the gimple level and the need for the x86 backend to generate
>> the vector zero test will go away.  And if it does go away, we want those
>> #includes to disappear.
>>
>> Additionally, instead of including stringpool.h & tree-ssanames.h, include
>> "ssa.h" -- as a general rule.
>>
>>
>>  static rtx legitimize_dllimport_symbol (rtx, bool);
>>  static rtx legitimize_pe_coff_extern_decl (rtx, bool);
>> @@ -41100,6 +41102,47 @@ ix86_vectorize_builtin_gather (const_tree
>> mem_vectype,
>>    return ix86_get_builtin (code);
>>  }
>>
>> +/* Returns true if SOURCE type is supported by builtin ptest.
>> +   NAME is lhs of created ptest call.  All created statements are added
>> +   to GS.  */
>> +
>> +static bool
>> +ix86_vectorize_build_zero_vector_test (tree source, tree name,
>>
>> Given the stated goal of not doing this in the target files, I'm not going
>> to actually look at this routine or any of the infrastructure for this
>> target hook.
>>
>>
>> diff --git a/gcc/params.def b/gcc/params.def
>> index 3e4ba3a..9e8de11 100644
>> --- a/gcc/params.def
>> +++ b/gcc/params.def
>> @@ -1069,6 +1069,12 @@ DEFPARAM (PARAM_MAX_STORES_TO_SINK,
>>            "Maximum number of conditional store pairs that can be sunk",
>>            2, 0, 0)
>>
>> +/* Enable inserion test on zero mask for masked stores if non-zero.  */
>> s/inserion/insertion/
>>
>> +DEFPARAM (PARAM_ZERO_TEST_FOR_STORE_MASK,
>> +         "zero-test-for-store-mask",
>> +         "Enable insertion of test on zero mask for masked stores",
>> +         1, 0, 1)
>> I'm resisting the temptation to bike-shed...  I don't like the name, but I
>> don't have a better one yet.  Similarly for the short description.
>>
>>
>> +/* { dg-final { scan-tree-dump-times "Move MASK_STORE to new bb" 2 "vect" }
>> } */
>> +/* { dg-final { cleanup-tree-dump "vect" } } */
>> cleanup-tree-dump is no longer needed.
>>
>>
>> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
>> index 7ba0d8f..e31479b 100644
>> --- a/gcc/tree-vect-stmts.c
>> +++ b/gcc/tree-vect-stmts.c
>> @@ -2072,6 +2072,7 @@ vectorizable_mask_load_store (gimple stmt,
>> gimple_stmt_iterator *gsi,
>>      {
>>        tree vec_rhs = NULL_TREE, vec_mask = NULL_TREE;
>>        prev_stmt_info = NULL;
>> +      LOOP_VINFO_HAS_MASK_STORE (loop_vinfo) = true;
>>        for (i = 0; i < ncopies; i++)
>> If we need to keep this field, can you initialize it just after the
>> STMT_VINFO_GATHER_P test after the /** Transform **/ comment.  It seems
>> kind-of buried and hard to find in this location.
>>
>> I'm not really sure why Richi has objected to the field.  Yea we can
>> re-analyze stuff in the vectorizer, but we'd be repeating a fair number of
>> tests (presumably we'd refactor the start of vectorizable_mask_load_store to
>> avoid duplication).  That seems more wasteful than another field.
>>
>>
>>
>>         {
>>           unsigned align, misalign;
>> diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
>> index ff46a52..debb351 100644
>> --- a/gcc/tree-vectorizer.c
>> +++ b/gcc/tree-vectorizer.c
>> @@ -92,7 +92,8 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "dbgcnt.h"
>>  #include "gimple-fold.h"
>>  #include "tree-scalar-evolution.h"
>> -
>> +#include "stringpool.h"
>> +#include "tree-ssanames.h"
>> In general, rather than including stringpool.h and tree-ssanames.h, just
>> include "ssa.h".
>>
>>
>>
>>  /* Loop or bb location.  */
>>  source_location vect_location;
>> @@ -422,6 +423,184 @@ set_uid_loop_bbs (loop_vec_info loop_vinfo, gimple
>> loop_vectorized_call)
>>    free (bbs);
>>  }
>>
>> +/* Helper for optimize_mask_stores: returns true if STMT sinking to end
>> +   of BB is valid and false otherwise.  */
>> +
>> +static bool
>> +is_valid_sink (gimple stmt, gimple last_store)
>> +{
>> +  tree vdef;
>> +  imm_use_iterator imm_it;
>> +  gimple use_stmt;
>> +  basic_block bb = gimple_bb (stmt);
>> +
>> +  if ((vdef = gimple_vdef (stmt)))
>> +    {
>> +      int cross = 0;
>> +      /* Check that ther are no store vuses in current bb.  */
>> +      FOR_EACH_IMM_USE_STMT (use_stmt, imm_it, vdef)
>> +       if (gimple_bb (use_stmt) == bb)
>> +         cross++;
>> Seems to me that you should get rid of "cross" and just "return false" here.
>> There's no need to keep looking at the immediate uses once you've found one
>> in the same block.  That also allows you to simplify this following code ...
>>
>> +
>> +      if (cross == 0)
>> +       return true;
>> Eliminate the conditional and return true here.
>>
>>
>> +    }
>> +  else if (gimple_vuse (stmt) == NULL_TREE)
>> +    return true;
>> +  else if (gimple_vuse (stmt) == gimple_vuse (last_store))
>> +    return true;
>> +  return false;
>> +}
>> +
>> +/* The code below is trying to perform simple optimization - do not execute
>> +   masked store statement if its mask is zero mask since loads that follow
>> +   a masked store can be blocked.
>> +   It put all masked stores statements with the same mask into the new bb
>> +   with a check on zero mask.  */
>> I suspect you need to re-wrap the comment to 80 columns.
>>
>>
>> +
>> +  /* Loop has masked stores.  */
>> +  while (!worklist.is_empty ())
>> +    {
>> +      gimple last, def_stmt, last_store;
>> +      edge e, efalse;
>> +      tree mask, val;
>> +      basic_block store_bb, join_bb;
>> +      gimple_stmt_iterator gsi_to;
>> +      gimple_seq gs;
>> +      tree arg3;
>> +      tree vdef, new_vdef;
>> +      gphi *phi;
>> +      bool first_dump;
>> +
>> +      last = worklist.pop ();
>> +      mask = gimple_call_arg (last, 2);
>> Are there ever cases where the mask is a compile-time constant?  I wouldn't
>> think so, but I'm totally unfamiliar with all the tree-vector code.
>>
>>
>> +      val = make_ssa_name (integer_type_node);
>> +      gs = NULL;
>> +      /* Escape if target does not support test on zero mask.  */
>> +
>> +      if (!targetm.vectorize.build_zero_vector_test (mask, val, &gs))
>> +       {
>> +         if (dump_enabled_p ())
>> +           dump_printf (MSG_NOTE, "Target does not support ptest!\n");
>> +         return;
>> +       }
>> +      gcc_assert (gs != NULL);
>> As noted earlier, I think we need to revisit this and look for a way to do
>> this test without calling into bits in the target files.  I'd like to see
>> the gimple you tried to generate and any error messages that were spit out
>> when you did so.
>>
>>
>>
>>
>> +
>> +      /* Create new bb.  */
>> +      e = split_block (bb, last);
>> +      join_bb = e->dest;
>> +      store_bb = create_empty_bb (bb);
>> +      add_bb_to_loop (store_bb, loop);
>> When we split BB via split_block, is the newly created block (JOIN_BB)
>> automatically added to the loop?
>>
>>
>>
>>
>>
>> +      e->flags = EDGE_TRUE_VALUE;
>> +      efalse = make_edge (bb, store_bb, EDGE_FALSE_VALUE);
>> +      /* Don't want to put STORE_BB to unlikely part and use
>> +        50% probability.  */
>> +      store_bb->frequency = bb->frequency / 2;
>> +      efalse->probability = REG_BR_PROB_BASE / 2;
>> +      make_edge (store_bb, join_bb, EDGE_FALLTHRU);
>> Like Richi, I'd expect that in practice STORE_BB will be the more likely
>> taken, how much so, I don't know, but 50-50 is probably not right.  I'd
>> think this would affect the ultimate block layout to some degree as well.
>> Do we want the straightline path to include STORE_BB (which implies a
>> branch-around STORE_BB and inverting the test from what you're doing now).
>>
>>
>>
>>
>> +      /* Create new PHI node for vdef of the last masked store:
>> +        .MEM_2 = VDEF <.MEM_1>
>> +        will be converted to
>> +        .MEM.3 = VDEF <.MEM_1>
>> +        and new PHI node will be created in join bb
>> +        .MEM_2 = PHI <.MEM_1, .MEM_3>
>> +      */
>> +      vdef = gimple_vdef (last);
>> +      gcc_assert (vdef && TREE_CODE (vdef) == SSA_NAME);
>> +      new_vdef = make_ssa_name (gimple_vop (cfun), last);
>> +      phi = create_phi_node (vdef, join_bb);
>> +      add_phi_arg (phi, new_vdef, EDGE_SUCC (store_bb, 0),
>> UNKNOWN_LOCATION);
>> +      gimple_set_vdef (last, new_vdef);
>> +      first_dump = true;
>> Presumably you don't add the phi arg associated with the edge BB->JOIN_BB
>> because you don't know it until the end of the loop below, right?
>>
>>
>> Overall I think this is pretty close.  I'd like you to focus on getting the
>> test for a zero store mask sorted out.
>>
>> Jeff
>>
>>

Reply via email to