[Bug tree-optimization/29680] [4.3 Regression] Misscompilation of spec2006 gcc

dberlin at dberlin dot org Thu, 09 Nov 2006 16:03:33 -0800


------- Comment #34 from dberlin at gcc dot gnu dot org  2006-11-10 00:03 
-------
Subject: Re:  [4.3 Regression] Misscompilation of spec2006 gcc

On 9 Nov 2006 21:48:25 -0000, dnovillo at redhat dot com
<[EMAIL PROTECTED]> wrote:
>
>
> ------- Comment #33 from dnovillo at redhat dot com  2006-11-09 21:48 -------
> Subject: Re:  [4.3 Regression] Misscompilation
>  of spec2006 gcc
>
> dberlin at dberlin dot org wrote on 11/09/06 16:28:
>
> > Uh, LIM and store sinking are too.  Roughly all of our memory
> > optimizations are.
> >
> They are?  Really?  Can you show me where exactly?

They won't get incorrect results. They will get less good results than
they do now.

Take LIM and store motion (sorry, i meant SM, not store sinking)
determine_max_movement relies on the VIRTUAL_KILL information to
determine what must be moved together.  Because things that would
previously kill different symbol versions, now will kill the same
symbol version (due to being factored), it will add false dependencies
unless someone changes it.

Take the following example:
int e,f,g,h;
int main(int argc)
{
  int *a, *b, *d;
 int i;
 a = &e;
 if (argc)
   a = &f;
 b = &g;
 if (argc)
   b = &h;
 d = &e;
 if (argc)
   d = &h;
 for (i = 0; i < 50; i++)
   {
     *a = 1;
     *b = 2;
     *d = 3;
   }
}

previously, you would have ...
 #   e_22 = V_MAY_DEF <e_14>;
  #   f_23 = V_MAY_DEF <f_15>;
  *a_1 = 1;
  #   g_24 = V_MAY_DEF <g_16>;
  #   h_25 = V_MAY_DEF <h_17>;
  *b_2 = 2;
  #   e_26 = V_MAY_DEF <e_22>;
  #   h_27 = V_MAY_DEF <h_25>;
  *d_3 = 3;

Note that *a and *b do not vdef any symbol that is the same.

mem-ssa gives you (as of today):

  # .MEM_16 = VDEF <.MEM_14, f_20>
  *a_1 = 1;
  # .MEM_17 = VDEF <.MEM_14, g_21>
  *b_2 = 2;
  # .MEM_18 = VDEF <.MEM_16, .MEM_17>
  *d_3 = 3;

note that now *a and *b both vdef MEM_14.

SM is going to say these stores are dependent on each other because
they "kill" the same version of the same variable unless you teach it
to look at the get_loads_and_stores bitmap.
Previously, it would not.

> > The changes i have to make to PRE (and to the other things) to
> > account for this is actually to rebuild the non-mem-ssa-factored (IE
> > the current factored) form out of the chains by seeing what symbols
> > they really affect.
> >
> OK, so how come you were so positive about the new interface?

When have i been overabundantly positive?
I said I'd deal with it.  I'm neither here nor there.  I've relied on
your statements that you believe it will make things significantly
better without loss of precision.  We are going to pay a cost in time
for passes to make use of this information. I believed you were aware
of this.

>  I need to
> understand what was the great difficulty you ran into that made you
> change your mind.  I need to see a specific example.
>
> See, the UD chains you get in mem-ssa are neither incomplete nor wrong.
Nobody said they are wrong, but I would actually argue they are no
longer really the same as SSA in a way that matters, if you want to
pick nits.
SSA variables have not only the property that they are *assigned to*
once, but the property that they are *defined* by a single operation.
Our current virtual operand scheme has this property.
Yours does not, because variables may be defined multiple times,
although they are still singley assigned.
You can argue this is not a requirement of SSA.  Honestly, it makes no
difference to me.  The upshot is that by losing this property, you
make them less useful than they could be.

> The symbols affected are right there in plain sight, so there is no
> loss of any information.

Errr, last time i looked, you had to call a function to get the
*actual* list of loads or stores affected by a statement.
Why does this matter?

All of our memory optimizations are trying to figure out three things:

1. Will two memory accesses return the same results (PRE, DSE)?
2. Do these two memory accesses have a dependency (PRE, SM, DSE, LIM).
3. If I hoist this memory to block X, will it still have the same
value as it does in block Y (PRE, SM, LIM).

Your stuff has no real affect on #2, but it makes #1 and #3 harder
(not less precise, mind you), at the cost of reducing the number of
operands.

It does so by factoring stores in a way that disables the ability to
see where loads are not *currently*, but *could*, validly be live.  In
other words, it was previously possible to simply determine whether
two stores touched the same piece of memory by looking at the
versions.  This is no longer true.

PRE does #1 and #3 through value numbering memory and tracking the
distinct live ranges of virtual variables.
 SM and LIM do #3 by simply using dependency information to determine
what needs to be moved together and grouping operations that vdef the
same variable together.

SM and LIM will thus still get *correct* information in the above
example, but it's going to get false dependencies due to defs of the
same mem version, unless something disambiguates between those store
dependencies by looking at the underlying loads and stores involved.

Previously, you could value number memory (and thus answer questions
#1) simply by making a list of all the virtual variable  versions
associated with it, and the value number of the underlying accesses.
You could see whether these value numbers would still be *obviously*
be valid affected moving above or below a certain store *just by
seeing what  virtual variable versions were defined by that statement
and whether it was a vuse version used by one of the value numbers in
your statements*.

It is no longer possible to get a precise answer this way when you
factor stores the way you do, because the name of the virtual variable
defined by a store is actually no longer actually a function of what
symbols it defines anymore.
Take the above case.
If we simply use virtual variable versions to value number memory, we
will believe that *a and *b are possible stores to the same piece of
memory, even though they are not.

In order to get the *correct* answer in the presence of phi nodes, we
currently collect the variables the phi nodes were joining together,
and consider a def of the phi result to be a def of the versions
joined by the phi

IE
if you had

phi_result_3  = <a_1, b_2>
VUSE <phi_result_3>
*foo = a
would be considered a kill of any value numbers containing vuses of
either a_1 or b_2.

So how do i plan on recovering the information of figuring out where
things can be moved to?

Well, essentially, PRE wants a a set of "live memory variable
versions" for a given statement so it can see where things can be
moved to.

I am going to start out at the top of the program, and walk in
dominator order, generating a bitmap.

We initially assume the version of everything in the
load_store_value_bitmap is 0.

Every time we see a vdef, we call get_loads_and_stores.   For each
variable in the underlying load and store bitmap, we remove the bit
for the old version's id from load_store_value_bitmap, and add a bit
for the new version's id.

Every time we see a vuse, we use the current load_store_value_bitmap
as part of it's value number, and attach the bitmap to the current
statement.

We attach the load_store_value_bitmap to the top and bottom of each block.

Every time we see a phi node, we generate new versions for each of the
loaded  and stored variables of the statements that this phi node is
joining.

We can translate a value upwards through phis by looking at what bits
in the bitmaps changed for each edge.

So basically, i'm doing virtual SSA renaming and storing some of the
reaching information persistently.

This is how Load PRE was done in the SSAPRE infrastructure, and it
works out okay.
But it really does mean that we aren't using the U-D chains anymore
for loads and stores, because we can't.

>
> > For at least all the opts i see us doing, it makes them more or less
> > useless without doing things (like reexpanding them) first. Because
> > this is true, I'm not sure it's a good idea at all, which is why i'm
> > still on the fence.
> >
> But you still haven't *shown* me where the hardness or slowness comes
> in.

How not?
As i've told you before, and showed you before, it was previously the
case we could value number memory directly and determine whether it
was going to be legal in a given block. mem-ssa breaks this.
It was previously possible to accurately and quickly determine where
stores to memory that conflicted are just by looking at versions. This
is no longer true.  This in turn removes the ability to determine
where new loads can be placed simply by looking at VDEF RHS variables.

I've reexplained this above.

>  Granted, the work is still unfinished so we can't really do actual
> measurements.  But I need to understand where the difficulties will be
> so that I can accomodate the infrastructure.  It's obviously not my
> intent to make things harder to use.
>
>
> --
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29680
>
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
>

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29680

[Bug tree-optimization/29680] [4.3 Regression] Misscompilation of spec2006 gcc

Reply via email to