On Mon, 7 May 2012, Aldy Hernandez wrote: > Hi. Sorry for the delay. There were various tricky hiccups along the way to > bootstrappability and regression cleanliness... > > On 04/26/12 04:51, Richard Guenther wrote: > > On Wed, 25 Apr 2012, Aldy Hernandez wrote: > > > > > On 04/25/12 06:45, Richard Guenther wrote: > > > > On Tue, Apr 24, 2012 at 7:43 PM, Aldy Hernandez<al...@redhat.com> > > > > wrote: > > > > > On 04/13/12 03:46, Richard Guenther wrote: > > > > > > > > > > > > On Fri, Apr 13, 2012 at 12:11 AM, Aldy Hernandez<al...@redhat.com> > > > > > > wrote: > > > > Speak of loads, I am keeping the information as an additional bitmap in > > > `memory_accesses', as ->refs_in_loop was set for stores as well, so I > > > couldn't > > > depend on it. Let me know if you have another idea. > > > > Hmm, refs_in_loop& ~all_refs_stored_in_loop, so instead of > > > > + bitmap reads = VEC_index (bitmap, memory_accesses.reads_in_loop, > > + loop->num); > > + ref_read_in_loop_p = bitmap_bit_p (reads, ref->id); > > > > ref_read_in_loop_p = bitmap_bit_p (refs, ref->id)&& !bitmap_bit_p > > (stores, ref->id); > > > > ? But maybe that doesn't work if a ref is both read and stored to. > > Btw, rather than adding a bitmap to memory_accesses I'd rather add > > a mark_ref_loaded corresponding to mark_ref_stored (or rather merge > > both into one) and a bitmap to struct mem_ref. > > I have done so as mark_ref_loaded(). I first tried to merge both into a > generic mark_ref(), to avoid iterating twice, but I was concerned with the > early loop exit of !bitmap_bit_p(ref->stored, loop->num), not knowing if I > should exit the loop altogether in the merged case or what. In either case, > the code was somewhat convoluted, so I opted for a separate clean function. > > In scanning the gimple stream for the loads I noticed that instances of two > component refs (say foo.bar) were not pointer comparable, so I am asking the > alias oracle with refs_may_alias_p. It also seemed to make more sense wrt > memory chunks that may alias. What do you think?
No, it asks for equal must aliases (it will replace them after all!). You should use operand_equal_p to compare reference trees. But why do all the following dance + + if (gimple_assign_single_p (stmt)) + { + tree t = gimple_assign_rhs1 (stmt); + /* ?? For some reason two exact COMPONENT_REFs cannot be + compared with pointer equality, so ask the alias oracle. */ + if (ref->mem == t + || ((TREE_CODE (t) == SSA_NAME + || DECL_P (t) + || handled_component_p (t) + || TREE_CODE (t) == MEM_REF + || TREE_CODE (t) == TARGET_MEM_REF) + && refs_may_alias_p (t, ref->mem))) + mark_ref_loaded (ref, loop); + } at all and not just if (is_stored) mark_ref_stored (ref, loop); else mark_ref_loaded (ref, loop); and similar in the !mem case mark the ref loaded unconditionally: mark_ref_loaded (ref, loop); if (gimple_vdef (stmt)) mark_ref_stored (ref, loop); > This patch is far more robust and fully tested. Two wrenches to complicate > things though... > > First, while inserting the code writing the _LSM_ temporary writes into the > original variables, I found a handful of tests where the order of the writes > mattered (aliased locations, inlined code, etc, IIRC). [This is in loops > where multiple stores are moved.] So iterating and inserting on the exit > edges caused the stores to be saved in reverse. I am now saving the edges and > threading the code, so they happen in the correct order: > > if (foo_flag_lsm) > foo = foo_lsm; > if (foo2_flag_lsm) > foo2 = foo2_lsm; > actual_exit: > > This required some edge redirection so we didn't jump to the original actual > exit prematurely when handling multiple moved stores. The dominator tree > needed to be updated, and there is one instance where I couldn't find a clever > way of saving the order without jumping through an inordinate amount of hoops. > It seemed easier to call recompute_dominator. > > Second, avoiding the load, unless it happens in the loop like you have > suggested, brought about some bogus unused warnings with the LSM temporaries. > What happens is that compute_ssa() creates uses of the LSM temporaries that > are not yet defined, so we end up with PHI nodes with a definition entry (D). > Since we are sure the eventual uses will be gated by their corresponding flag, > I created phi nodes with an initial value of 0 in the preheader of the loops > in which they are used. This solved the problem. I also played with > initializing to 0 at the entry block, but that seemed a bit too invasive. Hm. Btw, see also PR39612 which you could solve with that? > Andrew suggested the correct fix was to add a new pass that was able to do > some ?? flow sensitive data flow analysis ?? that could discover these > unreachable paths and insert the 0 phis at the start of the blocks > automatically. But that seemed like far too much work, considering how long > it's taken me to get this far ;-). > > Bootstraped and fully tested with multi_threaded_model_p=true hard-coded. > This is the revision I'd like to contribute, sans the hardcoded value. > > What do you think? I notice + /* Emit the load code into the latch, so that we are sure it will + be processed after all dependencies. */ + latch_edge = loop_latch_edge (loop); inserting code into the latch block is bad - the vectorizer will be confused by non-empty latches. You might at first (as you suggested yourself ;)) avoid trying to tackle the load issue. The patch is already complex enough, and a candidate for splitting out is the BB_IN_TRANSACTION change (hereby pre-approved). Your ChangeLog mentions changes that are no longer necessary (the target hook). I didn't quite get the store order issue - we _do_ preserve store ordering right now, do we? So how come introducing the flags change that? Thanks, Richard.