On Mon, Apr 9, 2012 at 8:00 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Fri, Mar 30, 2012 at 5:43 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: >> On Fri, Mar 30, 2012 at 4:15 PM, Richard Guenther >> <richard.guent...@gmail.com> wrote: >>> On Thu, Mar 29, 2012 at 5:25 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: >>>> On Thu, Mar 29, 2012 at 6:14 PM, Richard Guenther >>>> <richard.guent...@gmail.com> wrote: >>>>> On Thu, Mar 29, 2012 at 12:10 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: >>>>>> On Thu, Mar 29, 2012 at 6:07 PM, Richard Guenther >>>>>> <richard.guent...@gmail.com> wrote: >>>>>>> On Thu, Mar 29, 2012 at 12:02 PM, Bin.Cheng <amker.ch...@gmail.com> >>>>>>> wrote: >>>>>>>> Hi, >>>>>>>> Following is the tree dump of 094t.pre for a test program. >>>>>>>> Question is loads of D.5375_12/D.5375_14 are redundant on path <bb2, >>>>>>>> bb7, bb5, bb6>, >>>>>>>> but why not lowered into basic block 3, where it is used. >>>>>>>> >>>>>>>> BTW, seems no tree pass handles this case currently. >>>>>>> >>>>>>> tree-ssa-sink.c should do this. >>>>>>> >>>>>> It does not work for me, I will double check and update soon. >>>>> >>>>> Well, "should" as in, it's the place to do it. And certainly the pass >>>>> can sink >>>>> loads, so this must be a missed optimization. >>>>> >>>> Curiously, it is said explicitly that "We don't want to sink loads from >>>> memory." >>>> in tree-ssa-sink.c function statement_sink_location, and the condition is >>>> >>>> if (stmt_ends_bb_p (stmt) >>>> || gimple_has_side_effects (stmt) >>>> || gimple_has_volatile_ops (stmt) >>>> || (gimple_vuse (stmt) && !gimple_vdef (stmt)) >>>> <-----------------check load >>>> || (cfun->has_local_explicit_reg_vars >>>> && TYPE_MODE (TREE_TYPE (gimple_assign_lhs (stmt))) == BLKmode)) >>>> return false; >>>> >>>> I haven't found any clue about this decision in ChangeLogs. >>> >>> Ah, that's probably because usually you want to hoist loads and sink stores, >>> separating them (like a scheduler would do). We'd want to restrict sinking >>> of loads to sink into not post-dominated regions (thus where they end up >>> being executed less times). > > Hi Richard, > I am testing a patch to sink load of memory to proper basic block. > Everything goes fine except auto-vectorization, sinking of load sometime > corrupts the canonical form of data references. I haven't touched auto-vec > before and cannot tell whether it's good or bad to do sink before auto-vec. > For example, the slp-cond-1.c > > <bb 3>: > # i_39 = PHI <i_32(11), 0(2)> > D.5150_5 = i_39 * 2; > D.5151_10 = D.5150_5 + 1; > D.5153_17 = a[D.5150_5]; > D.5154_19 = b[D.5150_5]; > if (D.5153_17 >= D.5154_19) > goto <bb 9>; > else > goto <bb 4>; > > <bb 9>: > d0_6 = d[D.5150_5]; <-----this is sunk from bb3 > goto <bb 5>; > > <bb 4>: > e0_8 = e[D.5150_5]; <-----this is sunk from bb3 > > <bb 5>: > # d0_2 = PHI <d0_6(9), e0_8(4)> > k[D.5150_5] = d0_2; > D.5159_26 = a[D.5151_10]; > D.5160_29 = b[D.5151_10]; > if (D.5159_26 >= D.5160_29) > goto <bb 10>; > else > goto <bb 6>; > > > <bb 10>: > d1_11 = d[D.5151_10]; <-----this is sunk from bb3 > goto <bb 7>; > > <bb 6>: > e1_14 = e[D.5151_10]; <-----this is sunk from bb3 > > <bb 7>: > ....... > > I will look into auto-vect but not sure how to handle this case. > > Any comments? Thanks very much.
Simple - the vectorizer expects empty latch blocks. So simply never sink stuff into latch-blocks - I think the current code already tries to avoid that for regular computations. Richard. > -- > Best Regards.