------- Comment #34 from dberlin at gcc dot gnu dot org 2006-11-10 00:03 ------- Subject: Re: [4.3 Regression] Misscompilation of spec2006 gcc
On 9 Nov 2006 21:48:25 -0000, dnovillo at redhat dot com <[EMAIL PROTECTED]> wrote: > > > ------- Comment #33 from dnovillo at redhat dot com 2006-11-09 21:48 ------- > Subject: Re: [4.3 Regression] Misscompilation > of spec2006 gcc > > dberlin at dberlin dot org wrote on 11/09/06 16:28: > > > Uh, LIM and store sinking are too. Roughly all of our memory > > optimizations are. > > > They are? Really? Can you show me where exactly? They won't get incorrect results. They will get less good results than they do now. Take LIM and store motion (sorry, i meant SM, not store sinking) determine_max_movement relies on the VIRTUAL_KILL information to determine what must be moved together. Because things that would previously kill different symbol versions, now will kill the same symbol version (due to being factored), it will add false dependencies unless someone changes it. Take the following example: int e,f,g,h; int main(int argc) { int *a, *b, *d; int i; a = &e; if (argc) a = &f; b = &g; if (argc) b = &h; d = &e; if (argc) d = &h; for (i = 0; i < 50; i++) { *a = 1; *b = 2; *d = 3; } } previously, you would have ... # e_22 = V_MAY_DEF <e_14>; # f_23 = V_MAY_DEF <f_15>; *a_1 = 1; # g_24 = V_MAY_DEF <g_16>; # h_25 = V_MAY_DEF <h_17>; *b_2 = 2; # e_26 = V_MAY_DEF <e_22>; # h_27 = V_MAY_DEF <h_25>; *d_3 = 3; Note that *a and *b do not vdef any symbol that is the same. mem-ssa gives you (as of today): # .MEM_16 = VDEF <.MEM_14, f_20> *a_1 = 1; # .MEM_17 = VDEF <.MEM_14, g_21> *b_2 = 2; # .MEM_18 = VDEF <.MEM_16, .MEM_17> *d_3 = 3; note that now *a and *b both vdef MEM_14. SM is going to say these stores are dependent on each other because they "kill" the same version of the same variable unless you teach it to look at the get_loads_and_stores bitmap. Previously, it would not. > > The changes i have to make to PRE (and to the other things) to > > account for this is actually to rebuild the non-mem-ssa-factored (IE > > the current factored) form out of the chains by seeing what symbols > > they really affect. > > > OK, so how come you were so positive about the new interface? When have i been overabundantly positive? I said I'd deal with it. I'm neither here nor there. I've relied on your statements that you believe it will make things significantly better without loss of precision. We are going to pay a cost in time for passes to make use of this information. I believed you were aware of this. > I need to > understand what was the great difficulty you ran into that made you > change your mind. I need to see a specific example. > > See, the UD chains you get in mem-ssa are neither incomplete nor wrong. Nobody said they are wrong, but I would actually argue they are no longer really the same as SSA in a way that matters, if you want to pick nits. SSA variables have not only the property that they are *assigned to* once, but the property that they are *defined* by a single operation. Our current virtual operand scheme has this property. Yours does not, because variables may be defined multiple times, although they are still singley assigned. You can argue this is not a requirement of SSA. Honestly, it makes no difference to me. The upshot is that by losing this property, you make them less useful than they could be. > The symbols affected are right there in plain sight, so there is no > loss of any information. Errr, last time i looked, you had to call a function to get the *actual* list of loads or stores affected by a statement. Why does this matter? All of our memory optimizations are trying to figure out three things: 1. Will two memory accesses return the same results (PRE, DSE)? 2. Do these two memory accesses have a dependency (PRE, SM, DSE, LIM). 3. If I hoist this memory to block X, will it still have the same value as it does in block Y (PRE, SM, LIM). Your stuff has no real affect on #2, but it makes #1 and #3 harder (not less precise, mind you), at the cost of reducing the number of operands. It does so by factoring stores in a way that disables the ability to see where loads are not *currently*, but *could*, validly be live. In other words, it was previously possible to simply determine whether two stores touched the same piece of memory by looking at the versions. This is no longer true. PRE does #1 and #3 through value numbering memory and tracking the distinct live ranges of virtual variables. SM and LIM do #3 by simply using dependency information to determine what needs to be moved together and grouping operations that vdef the same variable together. SM and LIM will thus still get *correct* information in the above example, but it's going to get false dependencies due to defs of the same mem version, unless something disambiguates between those store dependencies by looking at the underlying loads and stores involved. Previously, you could value number memory (and thus answer questions #1) simply by making a list of all the virtual variable versions associated with it, and the value number of the underlying accesses. You could see whether these value numbers would still be *obviously* be valid affected moving above or below a certain store *just by seeing what virtual variable versions were defined by that statement and whether it was a vuse version used by one of the value numbers in your statements*. It is no longer possible to get a precise answer this way when you factor stores the way you do, because the name of the virtual variable defined by a store is actually no longer actually a function of what symbols it defines anymore. Take the above case. If we simply use virtual variable versions to value number memory, we will believe that *a and *b are possible stores to the same piece of memory, even though they are not. In order to get the *correct* answer in the presence of phi nodes, we currently collect the variables the phi nodes were joining together, and consider a def of the phi result to be a def of the versions joined by the phi IE if you had phi_result_3 = <a_1, b_2> VUSE <phi_result_3> *foo = a would be considered a kill of any value numbers containing vuses of either a_1 or b_2. So how do i plan on recovering the information of figuring out where things can be moved to? Well, essentially, PRE wants a a set of "live memory variable versions" for a given statement so it can see where things can be moved to. I am going to start out at the top of the program, and walk in dominator order, generating a bitmap. We initially assume the version of everything in the load_store_value_bitmap is 0. Every time we see a vdef, we call get_loads_and_stores. For each variable in the underlying load and store bitmap, we remove the bit for the old version's id from load_store_value_bitmap, and add a bit for the new version's id. Every time we see a vuse, we use the current load_store_value_bitmap as part of it's value number, and attach the bitmap to the current statement. We attach the load_store_value_bitmap to the top and bottom of each block. Every time we see a phi node, we generate new versions for each of the loaded and stored variables of the statements that this phi node is joining. We can translate a value upwards through phis by looking at what bits in the bitmaps changed for each edge. So basically, i'm doing virtual SSA renaming and storing some of the reaching information persistently. This is how Load PRE was done in the SSAPRE infrastructure, and it works out okay. But it really does mean that we aren't using the U-D chains anymore for loads and stores, because we can't. > > > For at least all the opts i see us doing, it makes them more or less > > useless without doing things (like reexpanding them) first. Because > > this is true, I'm not sure it's a good idea at all, which is why i'm > > still on the fence. > > > But you still haven't *shown* me where the hardness or slowness comes > in. How not? As i've told you before, and showed you before, it was previously the case we could value number memory directly and determine whether it was going to be legal in a given block. mem-ssa breaks this. It was previously possible to accurately and quickly determine where stores to memory that conflicted are just by looking at versions. This is no longer true. This in turn removes the ability to determine where new loads can be placed simply by looking at VDEF RHS variables. I've reexplained this above. > Granted, the work is still unfinished so we can't really do actual > measurements. But I need to understand where the difficulties will be > so that I can accomodate the infrastructure. It's obviously not my > intent to make things harder to use. > > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29680 > > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is. > -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29680