On Thu, Jan 19, 2017 at 12:07 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Thu, Jan 19, 2017 at 11:22 AM, Richard Biener > <richard.guent...@gmail.com> wrote: >> On Thu, Jan 19, 2017 at 11:25 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: >>> On Thu, Jan 19, 2017 at 9:42 AM, Richard Biener >> >> Or a later pass introduced it (in this case, the vectorizer). >> >>> Predcom could be the >>> only pass that can handle such case as it understands data reference >>> better. Note Martin's patch is not to skip handling of length == 0 >>> chain, later ref will still be CSEed with result of root ref, only the >>> combination operation like chain1 + chain2 is skipped. In this case, >>> following dom should be able to handle such (loop independent) cse >>> opportunities. >> >> I must admit I don't completely understand the consequences of this >> disabling but of course DOM should also be able to handle the CSE >> (ok, DOM is known to be quite weak with memory equivalence but >> it's not that data-dependence is really better in all cases). >> >> Can't we simply re-order refs in new_chain appropriately or handle >> this case in combinable_refs_p instead? > It's not refs need to be reordered, root ref always dominates others. > But yes, we need to find a dominator insertion place for combined > operation. Looking at function reassociate_to_the_same_stmt, it > simply inserts new_stmt at root_stmt of root ref, which causes ICE in > this case. The new_stmt needs to be inserted at a place also > dominating combination of later refs. We can either compute the > information in place, or compute and pass the information from > previous combinable_refs_p. This should be the real fix. Hi All, As analyzed, root cause is Predcom inserts combined stmt at place only wrto the root refs. This is not enough because after combination, result is also used by the following combined refs, not only root refs. This patch fixes the ICE by processing all refs reversely and computing dominance point for insertion. When it comes to the root refs, dominance point is ready for use. Bootstrap and test on x86_64 and AArch64 ongoing, is it OK if no failures?
Thanks, bin 2017-01-19 Bin Cheng <bin.ch...@arm.com> PR tree-optimization/70754 * tree-predcom.c (stmt_combining_refs): New parameter INSERT_BEFORE. (reassociate_to_the_same_stmt): New parameter INSERT_BEFORE. Insert combined stmt before it if not NULL. (combine_chains): Process refs reversely and compute dominance point for root ref. gcc/testsuite/ChangeLog 2017-01-19 Bin Cheng <bin.ch...@arm.com> PR tree-optimization/70754 * gfortran.dg/pr70754.f90: New test. >> That is, I understand the patch as a hack as it should be always >> possible to find dominating refs? >> >> In fact the point of the assert tells us a simpler fix may be >> >> Index: tree-predcom.c >> =================================================================== >> --- tree-predcom.c (revision 244519) >> +++ tree-predcom.c (working copy) >> @@ -2330,6 +2334,12 @@ combine_chains (chain_p ch1, chain_p ch2 >> break; >> } >> } >> + if (new_chain->length == 0 >> + && ! new_chain->has_max_use_after) >> + { >> + release_chain (new_chain); >> + return NULL; >> + } >> >> ch1->combined = true; >> ch2->combined = true; >> >> which obviously matches the assert we run into for the testcase? I'd >> be ok with that >> (no stmt_dominates_stmt_p, heh) with a suitable comment before it. >> >> Richard. >>
diff --git a/gcc/testsuite/gfortran.dg/pr70754.f90 b/gcc/testsuite/gfortran.dg/pr70754.f90 new file mode 100644 index 0000000..d7e790c --- /dev/null +++ b/gcc/testsuite/gfortran.dg/pr70754.f90 @@ -0,0 +1,35 @@ +! { dg-do compile } +! { dg-options "-Ofast" } +module m + implicit none + private + save + + integer, parameter, public :: & + ii4 = selected_int_kind(6), & + rr8 = selected_real_kind(13) + + integer (ii4), dimension(40,40,199), public :: xyz + public :: foo +contains + subroutine foo(a) + real (rr8), dimension(40,40), intent(out) :: a + real (rr8), dimension(40,40) :: b + integer (ii4), dimension(40,40) :: c + integer i, j + + do i=1,20 + b(i,j) = 123 * a(i,j) + 34 * a(i,j+1) & + + 34 * a(i,j-1) + a(i+1,j+1) & + + a(i+1,j-1) + a(i-1,j+1) & + + a(i-1,j-1) + c(i,j) = 123 + end do + + where ((xyz(:,:,2) /= 0) .and. (c /= 0)) + a = b/real(c) + elsewhere + a = 456 + endwhere + end subroutine foo +end module m diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c index f0b70a6..9723e9c 100644 --- a/gcc/tree-predcom.c +++ b/gcc/tree-predcom.c @@ -2164,10 +2164,11 @@ remove_name_from_operation (gimple *stmt, tree op) } /* Reassociates the expression in that NAME1 and NAME2 are used so that they - are combined in a single statement, and returns this statement. */ + are combined in a single statement, and returns this statement. Note the + statement is inserted before INSERT_BEFORE if it's not NULL. */ static gimple * -reassociate_to_the_same_stmt (tree name1, tree name2) +reassociate_to_the_same_stmt (tree name1, tree name2, gimple *insert_before) { gimple *stmt1, *stmt2, *root1, *root2, *s1, *s2; gassign *new_stmt, *tmp_stmt; @@ -2224,6 +2225,12 @@ reassociate_to_the_same_stmt (tree name1, tree name2) var = create_tmp_reg (type, "predreastmp"); new_name = make_ssa_name (var); new_stmt = gimple_build_assign (new_name, code, name1, name2); + if (insert_before && stmt_dominates_stmt_p (insert_before, s1)) + bsi = gsi_for_stmt (insert_before); + else + bsi = gsi_for_stmt (s1); + + gsi_insert_before (&bsi, new_stmt, GSI_SAME_STMT); var = create_tmp_reg (type, "predreastmp"); tmp_name = make_ssa_name (var); @@ -2240,7 +2247,6 @@ reassociate_to_the_same_stmt (tree name1, tree name2) s1 = gsi_stmt (bsi); update_stmt (s1); - gsi_insert_before (&bsi, new_stmt, GSI_SAME_STMT); gsi_insert_before (&bsi, tmp_stmt, GSI_SAME_STMT); return new_stmt; @@ -2249,10 +2255,11 @@ reassociate_to_the_same_stmt (tree name1, tree name2) /* Returns the statement that combines references R1 and R2. In case R1 and R2 are not used in the same statement, but they are used with an associative and commutative operation in the same expression, reassociate - the expression so that they are used in the same statement. */ + the expression so that they are used in the same statement. The combined + statement is inserted before INSERT_BEFORE if it's not NULL. */ static gimple * -stmt_combining_refs (dref r1, dref r2) +stmt_combining_refs (dref r1, dref r2, gimple *insert_before) { gimple *stmt1, *stmt2; tree name1 = name_for_ref (r1); @@ -2263,7 +2270,7 @@ stmt_combining_refs (dref r1, dref r2) if (stmt1 == stmt2) return stmt1; - return reassociate_to_the_same_stmt (name1, name2); + return reassociate_to_the_same_stmt (name1, name2, insert_before); } /* Tries to combine chains CH1 and CH2 together. If this succeeds, the @@ -2309,14 +2316,42 @@ combine_chains (chain_p ch1, chain_p ch2) new_chain->rslt_type = rslt_type; new_chain->length = ch1->length; - for (i = 0; (ch1->refs.iterate (i, &r1) - && ch2->refs.iterate (i, &r2)); i++) + gimple *insert = NULL; + auto_vec<dref> tmp_refs; + gcc_assert (ch1->refs.length () == ch2->refs.length ()); + /* Process in reverse order so dominance point is ready when it comes + to the root ref. */ + for (i = ch1->refs.length (); i > 0; i--) { + r1 = ch1->refs[i - 1]; + r2 = ch2->refs[i - 1]; nw = XCNEW (struct dref_d); - nw->stmt = stmt_combining_refs (r1, r2); nw->distance = r1->distance; + nw->stmt = stmt_combining_refs (r1, r2, i == 1 ? insert : NULL); + + /* Record dominance point where root combined stmt should be inserted + for chains with 0 length. Though all root refs dominate following + refs, it's possible the combined stmt doesn't. See PR70754. */ + if (ch1->length == 0 + && (insert == NULL || stmt_dominates_stmt_p (nw->stmt, insert))) + insert = nw->stmt; + + tmp_refs.safe_push (nw); + } + + /* Restore the order for new chain's refs. */ + for (i = tmp_refs.length (); i > 0; i--) + new_chain->refs.safe_push (tmp_refs[i - 1]); + + ch1->combined = true; + ch2->combined = true; - new_chain->refs.safe_push (nw); + /* For chains with 0 length, has_max_use_after must be true since root + combined stmt must dominates others. */ + if (new_chain->length == 0) + { + new_chain->has_max_use_after = true; + return new_chain; } new_chain->has_max_use_after = false; @@ -2331,8 +2366,6 @@ combine_chains (chain_p ch1, chain_p ch2) } } - ch1->combined = true; - ch2->combined = true; return new_chain; }