To represent call-clobbering effects we currently collect all the call-clobbered symbols and make every call a definition site for each of them. For instance, given 3 global variables X, Y and Z:
foo() { X = 3 Y = 1 Z = X + 1 bar () Y = X + 2 return Y + Z }
we put the three symbols in FUD chain form as follows:
foo() { # X_2 = V_MUST_DEF <X_1> X = 3
# Y_4 = V_MUST_DEF <Y_12> Y = 1
# Z_6 = V_MUST_DEF <Z_11> # VUSE <X_2> Z = X + 1
# X_3 = V_MAY_DEF <X_2> # Y_5 = V_MAY_DEF <Y_4> # Z_7 = V_MAY_DEF <Z_6> bar ()
# VUSE <X_3> # Y_8 = V_MUST_DEF <Y_5> Y = X + 2
# VUSE <Y_8> # VUSE <Z_7> return Y + Z; }
The real IL is more detailed, but that's the general idea. You'll see that the call to bar() generates one may-def for each call-clobbered variable. This prevents the optimizers from propagating known values across the call. For instance, the value '3' associated to 'X_2' cannot be propagated across the call because Y = X + 2 uses 'X_3'. However, X_2 can be propagated into 'Z = X + 1', so the optimizers can change that to 'Z = 4'.
However, if we had 50 call-clobbered variables, that call to bar() would generate 50 may-defs. This can quickly grow out of hand for real programs, so we have a throttling mechanism which puts all the call-clobbered variables inside the same bag to reduce the number of virtual operands.
What we do is create a single symbol (.GLOBAL_VAR or GV) and then we alias every call-clobbered variable to it. So in this case, we end up with:
foo() { # GV_2 = V_MAY_DEF <GV_1> X = 3
# GV_4 = V_MAY_DEF <GV_2> Y = 1
# GV_5 = V_MAY_DEF <GV_4> Z = X + 1
# GV_6 = V_MAY_DEF <GV_5> bar ()
# GV_7 = V_MAY_DEF <GV_6> Y = X + 2
# VUSE <GV_7> return Y + Z; }
So, we've reduced the number of virtual operands, but we can no longer make some transformations that we could before (we are no longer able to propagate 3 into Z = X + 1).
We discussed this briefly on IRC, Dan described briefly what the IBM compiler does, but he thinks it may be too intrusive given GCC's implementation of FUD chains.
Another idea is already documented in a FIXME note in maybe_create_global_var. Create separate GVs for each type alias set, but that would not help in this case if all X, Y and Z are of the same type.
I've started thinking a little bit about this and perhaps we could do something along the lines of the idea that Jan proposed a few months (don't recall if it was a private or public message, Jan?). Jan's original proposal drastically reduced the number of vops, but it had a similar restriction on the optimizers, things would be too interconnected.
The variation that I have in mind is to consider every load and store operation a *load* of GLOBAL_VAR. Only call sites generate new names for GLOBAL_VAR. Something like this:
foo() { # X_2 = V_MUST_DEF <X_1> # VUSE <GV_13> X = 3
# Y_4 = V_MUST_DEF <Y_12> # VUSE <GV_13> Y = 1
# Z_6 = V_MUST_DEF <Z_11> # VUSE <X_2> # VUSE <GV_13> Z = X + 1
# GV_14 = V_MAY_DEF <GV_13> bar ()
# VUSE <X_2> # Y_8 = V_MAY_DEF <Y_4> # VUSE <GV_14> Y = X + 2
# VUSE <Y_8> # VUSE <Z_6> # VUSE <GV_14> return Y + Z; }
Advantages of this scheme:
- It doesn't unnecessarily tie up the optimizers. We can still propagate 3 into Z = X + 1 (as it loads X_2 and GV_13).
- Every call site will always generate exactly one V_MAY_DEF.
- We can naturally block propagations across call sites, notice how the load of X at 'Y = X + 2' loads from X_2 and GV_14 (the previous store had GV_13).
Disadvantages:
- It generates an additional vop per load/store. In some cases that may generate more virtual operands than what we generate now. Though I think that would be rare.
- Those VUSEs for GV at store sites look wonky.
- If there are no uses of X, Y and Z after the call to bar, DCE will think that those stores are dead. We would have to hack DCE to somehow seeing the call to bar() as a user for those stores.
The last problem is the one I have more reservations about. We could relate the stores to the call sites some other way outside the SSA web, but I don't much like any kind of implicit data flow relationships.
Thoughts? Ideas?
Thanks. Diego.