On Mon, Jul 20, 2015 at 5:05 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: > Hi Jeff! > > Thanks for your details comments. > > You asked: > How are conditionals on vectors usually handled? You should try to > mimick that and report, with detail, why that's not working. > > In current implementation of vectorization pass all comparisons are > handled through COND_EXPR, i.e. vect-pattern pass transforms all > comparisons producing bool values to conditional expressions like a[i] > != 0 --> a[i]!=0? 1: 0 which vectorizers transforms to > vect-cond-expr. So we don't have operations with vector operands and > scalar (bool) result. > To implement such operations I introduced target-hook. Could you > propose another solution implementing it?
see below... > Thanks. > Yuri. > > 2015-07-10 8:51 GMT+03:00 Jeff Law <l...@redhat.com>: >> On 06/18/2015 08:32 AM, Yuri Rumyantsev wrote: >>> >>> Richard, >>> >>> Here is updated patch which does not include your proposal related to >>> the target hook deletion. >>> You wrote: >>>> >>>> I still don't understand why you need the new target hook. If we have a >>>> masked >>>> load/store then the mask is computed by an assignment with a >>>> VEC_COND_EXPR >>>> (in your example) and thus a test for a zero mask is expressible as just >>>> >>> > if (vect__ifc__41.17_167 == { 0, 0, 0, 0... }) >>>> >>>> >>>> or am I missing something? >>> >>> Such vector compare produces vector and does not set up cc flags >>> required for conditional branch (GIMPLE_COND). >>> If we use such comparison for GIMPLE_COND we got error message, so I >>> used target hook which does set up cc flags aka ptest instruction and >>> I left this part. ... what error message did you get? I suppose you ran into IL checking complaining about error: vector comparison returning a boolean ? It looks like we forbade equality compares for consistency thus you'd need to use vec_ifc_tem_4 = VIEW_CONVERT_EXPR<TImode> (vect_ifc_41.17_167); if (vec_ifc_tem_4 == 0) ... instead. Thus view-convert to an integer mode of the same size as the vector. Or we allow equality compares of vectors. Not sure if expansion / targets handle them well though - you'd have to check. Richard. >> I think we need to look for something better. I really don't like the idea >> of using a target hook like this within the gimple optimizers unless it's >> absolutely necessary. >> >> How are conditionals on vectors usually handled? You should try to mimick >> that and report, with detail, why that's not working. >> >> I'm also not happy with the mechanisms to determine whether or not we should >> make this transformation. I'm willing to hash through other details and >> come back to this issue once we've got some of the other stuff figured out. >> I guess a new flag with the target adjusting is the fallback if we can't >> come up with some reasonable way to select this on or off. >> >> The reason I don't like having the target files make this kind of decision >> is it makes more gimple transformations target dependent. Based on the >> history with RTL, that ultimately leads to an unwieldy mess. >> >> And yes, I know gimple isn't 100% target independent -- but that doesn't >> mean we should keep adding more target dependencies. Each one we add needs >> to be well vetted. >> >> >> patch.3 >> >> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c >> index 44a8624..e90de32 100644 >> --- a/gcc/config/i386/i386.c >> +++ b/gcc/config/i386/i386.c >> @@ -100,6 +100,8 @@ along with GCC; see the file COPYING3. If not see >> #include "tree-iterator.h" >> #include "tree-chkp.h" >> #include "rtl-chkp.h" >> +#include "stringpool.h" >> +#include "tree-ssanames.h" >> So ideally we'll figure out why you're unable to generate a suitable >> conditional at the gimple level and the need for the x86 backend to generate >> the vector zero test will go away. And if it does go away, we want those >> #includes to disappear. >> >> Additionally, instead of including stringpool.h & tree-ssanames.h, include >> "ssa.h" -- as a general rule. >> >> >> static rtx legitimize_dllimport_symbol (rtx, bool); >> static rtx legitimize_pe_coff_extern_decl (rtx, bool); >> @@ -41100,6 +41102,47 @@ ix86_vectorize_builtin_gather (const_tree >> mem_vectype, >> return ix86_get_builtin (code); >> } >> >> +/* Returns true if SOURCE type is supported by builtin ptest. >> + NAME is lhs of created ptest call. All created statements are added >> + to GS. */ >> + >> +static bool >> +ix86_vectorize_build_zero_vector_test (tree source, tree name, >> >> Given the stated goal of not doing this in the target files, I'm not going >> to actually look at this routine or any of the infrastructure for this >> target hook. >> >> >> diff --git a/gcc/params.def b/gcc/params.def >> index 3e4ba3a..9e8de11 100644 >> --- a/gcc/params.def >> +++ b/gcc/params.def >> @@ -1069,6 +1069,12 @@ DEFPARAM (PARAM_MAX_STORES_TO_SINK, >> "Maximum number of conditional store pairs that can be sunk", >> 2, 0, 0) >> >> +/* Enable inserion test on zero mask for masked stores if non-zero. */ >> s/inserion/insertion/ >> >> +DEFPARAM (PARAM_ZERO_TEST_FOR_STORE_MASK, >> + "zero-test-for-store-mask", >> + "Enable insertion of test on zero mask for masked stores", >> + 1, 0, 1) >> I'm resisting the temptation to bike-shed... I don't like the name, but I >> don't have a better one yet. Similarly for the short description. >> >> >> +/* { dg-final { scan-tree-dump-times "Move MASK_STORE to new bb" 2 "vect" } >> } */ >> +/* { dg-final { cleanup-tree-dump "vect" } } */ >> cleanup-tree-dump is no longer needed. >> >> >> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c >> index 7ba0d8f..e31479b 100644 >> --- a/gcc/tree-vect-stmts.c >> +++ b/gcc/tree-vect-stmts.c >> @@ -2072,6 +2072,7 @@ vectorizable_mask_load_store (gimple stmt, >> gimple_stmt_iterator *gsi, >> { >> tree vec_rhs = NULL_TREE, vec_mask = NULL_TREE; >> prev_stmt_info = NULL; >> + LOOP_VINFO_HAS_MASK_STORE (loop_vinfo) = true; >> for (i = 0; i < ncopies; i++) >> If we need to keep this field, can you initialize it just after the >> STMT_VINFO_GATHER_P test after the /** Transform **/ comment. It seems >> kind-of buried and hard to find in this location. >> >> I'm not really sure why Richi has objected to the field. Yea we can >> re-analyze stuff in the vectorizer, but we'd be repeating a fair number of >> tests (presumably we'd refactor the start of vectorizable_mask_load_store to >> avoid duplication). That seems more wasteful than another field. >> >> >> >> { >> unsigned align, misalign; >> diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c >> index ff46a52..debb351 100644 >> --- a/gcc/tree-vectorizer.c >> +++ b/gcc/tree-vectorizer.c >> @@ -92,7 +92,8 @@ along with GCC; see the file COPYING3. If not see >> #include "dbgcnt.h" >> #include "gimple-fold.h" >> #include "tree-scalar-evolution.h" >> - >> +#include "stringpool.h" >> +#include "tree-ssanames.h" >> In general, rather than including stringpool.h and tree-ssanames.h, just >> include "ssa.h". >> >> >> >> /* Loop or bb location. */ >> source_location vect_location; >> @@ -422,6 +423,184 @@ set_uid_loop_bbs (loop_vec_info loop_vinfo, gimple >> loop_vectorized_call) >> free (bbs); >> } >> >> +/* Helper for optimize_mask_stores: returns true if STMT sinking to end >> + of BB is valid and false otherwise. */ >> + >> +static bool >> +is_valid_sink (gimple stmt, gimple last_store) >> +{ >> + tree vdef; >> + imm_use_iterator imm_it; >> + gimple use_stmt; >> + basic_block bb = gimple_bb (stmt); >> + >> + if ((vdef = gimple_vdef (stmt))) >> + { >> + int cross = 0; >> + /* Check that ther are no store vuses in current bb. */ >> + FOR_EACH_IMM_USE_STMT (use_stmt, imm_it, vdef) >> + if (gimple_bb (use_stmt) == bb) >> + cross++; >> Seems to me that you should get rid of "cross" and just "return false" here. >> There's no need to keep looking at the immediate uses once you've found one >> in the same block. That also allows you to simplify this following code ... >> >> + >> + if (cross == 0) >> + return true; >> Eliminate the conditional and return true here. >> >> >> + } >> + else if (gimple_vuse (stmt) == NULL_TREE) >> + return true; >> + else if (gimple_vuse (stmt) == gimple_vuse (last_store)) >> + return true; >> + return false; >> +} >> + >> +/* The code below is trying to perform simple optimization - do not execute >> + masked store statement if its mask is zero mask since loads that follow >> + a masked store can be blocked. >> + It put all masked stores statements with the same mask into the new bb >> + with a check on zero mask. */ >> I suspect you need to re-wrap the comment to 80 columns. >> >> >> + >> + /* Loop has masked stores. */ >> + while (!worklist.is_empty ()) >> + { >> + gimple last, def_stmt, last_store; >> + edge e, efalse; >> + tree mask, val; >> + basic_block store_bb, join_bb; >> + gimple_stmt_iterator gsi_to; >> + gimple_seq gs; >> + tree arg3; >> + tree vdef, new_vdef; >> + gphi *phi; >> + bool first_dump; >> + >> + last = worklist.pop (); >> + mask = gimple_call_arg (last, 2); >> Are there ever cases where the mask is a compile-time constant? I wouldn't >> think so, but I'm totally unfamiliar with all the tree-vector code. >> >> >> + val = make_ssa_name (integer_type_node); >> + gs = NULL; >> + /* Escape if target does not support test on zero mask. */ >> + >> + if (!targetm.vectorize.build_zero_vector_test (mask, val, &gs)) >> + { >> + if (dump_enabled_p ()) >> + dump_printf (MSG_NOTE, "Target does not support ptest!\n"); >> + return; >> + } >> + gcc_assert (gs != NULL); >> As noted earlier, I think we need to revisit this and look for a way to do >> this test without calling into bits in the target files. I'd like to see >> the gimple you tried to generate and any error messages that were spit out >> when you did so. >> >> >> >> >> + >> + /* Create new bb. */ >> + e = split_block (bb, last); >> + join_bb = e->dest; >> + store_bb = create_empty_bb (bb); >> + add_bb_to_loop (store_bb, loop); >> When we split BB via split_block, is the newly created block (JOIN_BB) >> automatically added to the loop? >> >> >> >> >> >> + e->flags = EDGE_TRUE_VALUE; >> + efalse = make_edge (bb, store_bb, EDGE_FALSE_VALUE); >> + /* Don't want to put STORE_BB to unlikely part and use >> + 50% probability. */ >> + store_bb->frequency = bb->frequency / 2; >> + efalse->probability = REG_BR_PROB_BASE / 2; >> + make_edge (store_bb, join_bb, EDGE_FALLTHRU); >> Like Richi, I'd expect that in practice STORE_BB will be the more likely >> taken, how much so, I don't know, but 50-50 is probably not right. I'd >> think this would affect the ultimate block layout to some degree as well. >> Do we want the straightline path to include STORE_BB (which implies a >> branch-around STORE_BB and inverting the test from what you're doing now). >> >> >> >> >> + /* Create new PHI node for vdef of the last masked store: >> + .MEM_2 = VDEF <.MEM_1> >> + will be converted to >> + .MEM.3 = VDEF <.MEM_1> >> + and new PHI node will be created in join bb >> + .MEM_2 = PHI <.MEM_1, .MEM_3> >> + */ >> + vdef = gimple_vdef (last); >> + gcc_assert (vdef && TREE_CODE (vdef) == SSA_NAME); >> + new_vdef = make_ssa_name (gimple_vop (cfun), last); >> + phi = create_phi_node (vdef, join_bb); >> + add_phi_arg (phi, new_vdef, EDGE_SUCC (store_bb, 0), >> UNKNOWN_LOCATION); >> + gimple_set_vdef (last, new_vdef); >> + first_dump = true; >> Presumably you don't add the phi arg associated with the edge BB->JOIN_BB >> because you don't know it until the end of the loop below, right? >> >> >> Overall I think this is pretty close. I'd like you to focus on getting the >> test for a zero store mask sorted out. >> >> Jeff >> >>