Re: [PR tree-optimization/52558]: RFC: questions on store data race

Richard Guenther Thu, 26 Apr 2012 02:52:09 -0700

On Wed, 25 Apr 2012, Aldy Hernandez wrote:

> On 04/25/12 06:45, Richard Guenther wrote:
> > On Tue, Apr 24, 2012 at 7:43 PM, Aldy Hernandez<al...@redhat.com>  wrote:
> > > On 04/13/12 03:46, Richard Guenther wrote:
> > > >
> > > > On Fri, Apr 13, 2012 at 12:11 AM, Aldy Hernandez<al...@redhat.com>
> > > > wrote:
> 
> > +  /* ?? Perhaps we should cache this somewhere in the BB, or are
> > +     multiple levels of empty BB's common. ??  */
> > +  FOR_EACH_EDGE (e, ei, bb->preds)
> > +    {
> > +      int res = bb_in_transaction_1 (e->src);
> > +      if (res != -1)
> > +       return (bool) res;
> > +    }
> > +  return false;
> >
> > that's weird code - if predecessors are not all in or not in a transaction
> > you return a random value.  So it seems this is somewhat fragile.
> >
> > If bb status can be derived from looking at a single stmt why is the
> > transaction-ness not recorded as a basic-block flag then?  Thus,
> > why do we have a bit in gimple stmts?
> 
> How about we get rid of the GIMPLE bit altogether and keep it in the BB flags
> like you mention?  See patch.


Yeah, that looks good.

> > +  bool single_exit_p = single_pred_p (ex->dest);
> >
> > that's a strange variable name ;)
> 
> How about loop_has_only_one_exit? :)

Fine ;)

> >
> > +/* Helper function for execute_sm.  On every path that sets REF, set
> > +   an appropriate flag indicating the store.  */
> > +
> > +static tree
> > +execute_sm_if_changed_flag_set (struct loop *loop, mem_ref_p ref)
> > +{
> > ...
> > +  FOR_EACH_VEC_ELT (mem_ref_loc_p, locs, i, loc)
> > +    {
> > +      gimple_stmt_iterator gsi;
> > +      gimple stmt;
> > +
> > +      gsi = gsi_start_bb (gimple_bb (loc->stmt));
> > +      for (; !gsi_end_p (gsi); gsi_next (&gsi))
> > +       if (gsi_stmt (gsi) == loc->stmt)
> >
> > does not seem to do it on every path but on every REF setter.  And instead
> > of the innermost loop just use gsi_for_stmt (loc->stmt).
> 
> Updated comment.  Fixed.
> 
> >
> > +  /* Emit the load code into the latch, so that we are sure it will
> > +     be processed after all dependencies.  */
> > +  latch_edge = loop_latch_edge (loop);
> > +  load = gimple_build_assign (mem_ssa, unshare_expr (ref->mem));
> >     lim_data = init_lim_data (load);
> >     lim_data->max_loop = loop;
> >     lim_data->tgt_loop = loop;
> > +  gsi_insert_on_edge (latch_edge, load);
> > +  load = gimple_build_assign (tmp_var, mem_ssa);
> > +  lim_data = init_lim_data (load);
> > +  lim_data->max_loop = loop;
> > +  lim_data->tgt_loop = loop;
> > +  gsi_insert_on_edge (latch_edge, load);
> >
> > the 2nd assign is no longer a "load", so I suppose you don't need any
> > lim_data
> > for it?
> 
> Re-did all this.  See patch.
> 
> >
> > +      else if (store == STORE_IF_CHANGED_FLAG)
> > +       execute_sm_if_changed (ex, ref->mem, mem_ssa, tmp_var,
> > +                              store_flag);
> >
> > and this can pass NULL instead of mem_ssa?
> 
> Got rid of mem_ssa altogether, as well as the if-changed approach which proved
> inferior to the flags based approach.

Good.

> > Overall this looks reasonable.  With also handling trapping things I meant
> > that we should be able to apply store-motion to
> >
> > int a[256];
> > void foo (int *p)
> > {
> >    int i;
> >    for (i= 0;i<256;i++)
> >      if (flag)
> >        *p = a[i];
> > }
> >
> > but we cannot, because the transform
> >
> >    lsm = *p;
> >    for (i = 0; i<  256; ++i)
> >      if (flag)
> >        lsm = a[i];
> >    *p = lsm;
> >
> > even when the store is properly conditional the load is not (and you do not
> > change that).  The unconditional load may trap here.  What I meant to say
> > is that we do not need the load at all unless there is also a load involved
> > inside the loop - if we use the flag-based approach.  If there is also a
> > load
> > inside the loop we'd need to perform the load there (or not handle this
> > case)
> 
> Ahh!  Yes, this sounds very profitable.  Would you mind me doing this in a
> separate patch?  I am already handling loads in this incarnation, so this
> should be straightforward.

No, it's fine to do it separately.

> Speak of loads, I am keeping the information as an additional bitmap in
> `memory_accesses', as ->refs_in_loop was set for stores as well, so I couldn't
> depend on it.  Let me know if you have another idea.

Hmm, refs_in_loop & ~all_refs_stored_in_loop, so instead of

+      bitmap reads = VEC_index (bitmap, memory_accesses.reads_in_loop,
+                               loop->num);
+      ref_read_in_loop_p = bitmap_bit_p (reads, ref->id);

ref_read_in_loop_p = bitmap_bit_p (refs, ref->id) && !bitmap_bit_p 
(stores, ref->id);

?  But maybe that doesn't work if a ref is both read and stored to.
Btw, rather than adding a bitmap to memory_accesses I'd rather add
a mark_ref_loaded corresponding to mark_ref_stored (or rather merge
both into one) and a bitmap to struct mem_ref.

> Mildly tested.  Wadaya think?

Looks good overall, minus the detail of remembering refs loaded.

Richard.


> Thanks again.
> Aldy
> 

-- 
Richard Guenther <rguent...@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix ImendÃ¶rffer

Re: [PR tree-optimization/52558]: RFC: questions on store data race

Reply via email to