On Fri, Jul 3, 2015 at 11:37 AM, Alan Lawrence <alan.lawre...@arm.com> wrote: > Abe wrote: > >> In other words, the problem about which I was concerned is not going to be >> triggered by e.g. "if (c) x = ..." >> which lacks an attached "else x = ..." in a multithreaded program without >> enough locking just because 'x' is global/static. >> >> The only remaining case to consider is if some code being compiler takes >> the address of something thread-local and then "gives" >> that pointer to another thread. Even for _that_ extreme case, Sebastian >> says that the gimplifier will detect this >> "address has been taken" situation and do the right thing such that the >> new if converter also does the right thing. > > > Great :). I don't understand much/anything about how gcc deals with > thread-locals, but everything before that, all sounds good... > >> [Alan wrote:] >> >>> Can you give an example? >> >> >> The test cases in the GCC tree at "gcc.dg/vect/pr61194.c" and >> "gcc.dg/vect/vect-mask-load-1.c" >> currently test as: the new if-converter is "converting" something that`s >> already vectorizer-friendly... > >> [snip] >> >> However, TTBOMK the vectorizer already "understands" that in cases where >> its input looks like: >> >> x = c ? y : z; >> >> ... and 'y' and 'z' are both pure [side-effect-free] -- including, but not >> limited to, they must be non-"volatile" -- >> it may vectorize a loop containing code like the preceding, ignoring for >> this particular instance the C mandate >> that only one of {y, z} be evaluated... > > > My understanding, is that any decision as to whether one or both of y or z > is evaluated (when 'evaluation' involves doing any work, e.g. a load), has > already been encoded into the gimple/tree IR. Thus, if we are to only > evaluate one of 'y' or 'z' in your example, the IR will (prior to > if-conversion), contain basic blocks and control flow, that means we jump > around the one that's not evaluated. > > This appears to be the case in pr61194.c: prior to if-conversion, the IR for > the loop in barX is > > <bb 3>: > # i_16 = PHI <i_13(7), 0(2)> > # ivtmp_21 = PHI <ivtmp_20(7), 1024(2)> > _5 = x[i_16]; > _6 = _5 > 0.0; > _7 = w[i_16]; > _8 = _7 < 0.0; > _9 = _6 & _8; > if (_9 != 0) > goto <bb 4>; > else > goto <bb 5>; > > <bb 4>: > iftmp.0_10 = z[i_16]; > goto <bb 6>; > > <bb 5>: > iftmp.0_11 = y[i_16]; > > <bb 6>: > # iftmp.0_2 = PHI <iftmp.0_10(4), iftmp.0_11(5)> > z[i_16] = iftmp.0_2; > i_13 = i_16 + 1; > ivtmp_20 = ivtmp_21 - 1; > if (ivtmp_20 != 0) > goto <bb 7>; > else > goto <bb 8>; > > <bb 7>: > goto <bb 3>; > > which clearly contains (unvectorizable!) control flow. Without > -ftree-loop-if-convert-stores, if-conversion leaves this alone, and > vectorization fails (i.e. the vectorizer bails out because the loop has >2 > basic blocks). With -ftree-loop-if-convert-stores, if-conversion produces > > <bb 3>: > # i_16 = PHI <i_13(4), 0(2)> > # ivtmp_21 = PHI <ivtmp_20(4), 1024(2)> > _5 = x[i_16]; > _6 = _5 > 0.0; > _7 = w[i_16]; > _8 = _7 < 0.0; > _9 = _6 & _8; > iftmp.0_10 = z[i_16]; // <== here > iftmp.0_11 = y[i_16]; // <== here > iftmp.0_2 = _9 ? iftmp.0_10 : iftmp.0_11; > z[i_16] = iftmp.0_2; > i_13 = i_16 + 1; > ivtmp_20 = ivtmp_21 - 1; > if (ivtmp_20 != 0) > goto <bb 4>; > else > goto <bb 5>; > > <bb 4>: > goto <bb 3>; > > where I have commented the conditional loads that have become unconditional. > (Hence, "-ftree-loop-if-convert-stores" looks misnamed - it affects how the > if-conversion phase converts loads, too - please correct me if I > misunderstand (Richard?) ?!) This contains no control flow, and so is > vectorizable.
Not sure why the above is only handled with -ftree-loop-if-convert-stores. It's clearly if-converting the load as the store is unconditional. Ok, the dump says iftmp.0_10 = z[i_16]; tree could trap... from if (gimple_assign_rhs_could_trap_p (stmt)) { if (ifcvt_can_use_mask_load_store (stmt)) { gimple_set_plf (stmt, GF_PLF_2, true); *any_mask_load_store = true; return true; } if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "tree could trap...\n"); return false; I suppose that should use ifcvt_could_trap_p instead (clearly we write to z[i_16] unconditionally after the conditional load). Note that it says the access may trap because the check is somewhat too conservative, not using range-info of i_16. Looks like to do this we'd need to enable some analysis code currently run conditional on flag_tree_loop_if_convert_stores only. Index: gcc/tree-if-conv.c =================================================================== --- gcc/tree-if-conv.c (revision 225599) +++ gcc/tree-if-conv.c (working copy) @@ -874,7 +874,7 @@ if_convertible_gimple_assign_stmt_p (gim return true; } - if (gimple_assign_rhs_could_trap_p (stmt)) + if (ifcvt_could_trap_p (stmt, refs)) { if (ifcvt_can_use_mask_load_store (stmt)) { @@ -1297,18 +1297,15 @@ if_convertible_loop_p_1 (struct loop *lo } } - if (flag_tree_loop_if_convert_stores) - { - data_reference_p dr; + data_reference_p dr; - for (i = 0; refs->iterate (i, &dr); i++) - { - dr->aux = XNEW (struct ifc_dr); - DR_WRITTEN_AT_LEAST_ONCE (dr) = -1; - DR_RW_UNCONDITIONALLY (dr) = -1; - } - predicate_bbs (loop); + for (i = 0; refs->iterate (i, &dr); i++) + { + dr->aux = XNEW (struct ifc_dr); + DR_WRITTEN_AT_LEAST_ONCE (dr) = -1; + DR_RW_UNCONDITIONALLY (dr) = -1; } + predicate_bbs (loop); for (i = 0; i < loop->num_nodes; i++) { @@ -1323,9 +1320,8 @@ if_convertible_loop_p_1 (struct loop *lo return false; } - if (flag_tree_loop_if_convert_stores) - for (i = 0; i < loop->num_nodes; i++) - free_bb_predicate (ifc_bbs[i]); + for (i = 0; i < loop->num_nodes; i++) + free_bb_predicate (ifc_bbs[i]); /* Checking PHIs needs to be done after stmts, as the fact whether there are any masked loads or stores affects the tests. */ @@ -1399,14 +1395,10 @@ if_convertible_loop_p (struct loop *loop res = if_convertible_loop_p_1 (loop, &loop_nest, &refs, &ddrs, any_mask_load_store); - if (flag_tree_loop_if_convert_stores) - { - data_reference_p dr; - unsigned int i; - - for (i = 0; refs.iterate (i, &dr); i++) - free (dr->aux); - } + data_reference_p dr; + unsigned int i; + for (i = 0; refs.iterate (i, &dr); i++) + free (dr->aux); free_data_refs (refs); free_dependence_relations (ddrs); fixes the testcase for me. Richard. > > (This is all without your scratchpad patch, of course.) IOW this being > vectorized, or not, relies upon the preceding if-conversion phase removing > the control flow. > > HTH > Alan >