On Nov 13, 2007, Mark Mitchell <[EMAIL PROTECTED]> wrote: > Alexandre Oliva wrote: >>> What I don't understand is how it's actually going to work. What >>> are the notes you're inserting? >> >> They're always of the form >> >> DEBUG user-variable = expression
> Good, I understand that now. > Why is this better than associating user variables with assignments? I've already explained that, but let me try to sum it up again. If we annotate assignments, then not only do the annotations move around along with assignments (I don't think that's desirable), but when we optimize such assignments away, the annotations are either dropped or have to stand on their own. Since dropping annotations and moving them around are precisely opposed the goal of making debug information accurate, then keeping the annotations in place and enabling them to stand on their own is the right thing to do. Now, since we have to enable them to stand on their own, then we're faced with the following decision: either we make that the canonical annotation representation all the way from the beginning, or we piggyback the annotations on assignments until they're moved or removed, at which point they become stand-alone annotations. The former seems much more maintainable and simpler to deal with, and I don't see that there's a significant memory or performance penalty to this. >> That said, growing SET to add to it a list of variables (or components >> thereof) that the variable is assigned to could be made to work, to >> some extent. But when you optimize away such a set, you'd still have >> to keep the note around > Why? It seems to me that if we're no longer doing the assignment, then > the location where the value of the user variable can be found (if any) > is not changing at this point. The thing is that the *location* of the user variable is changing at that point. Either because its previous value was unavalable, or because it had remained only at a different location. Only at the point of the assignment should we associate the variable with the location that holds its current value. >> (set (reg i) (const_int 3)) ;; assigns to i >> (set (reg P1) (reg i)) >> (call (mem f)) >> (set (reg i) (const_int 7)) ;; assigns to i >> (set (reg i) (const_int 2)) ;; assigns to i >> (set (reg P1) (reg i)) >> (call (mem g)) >> >> could have been optimized to: >> >> (set (reg P1) (const_int 3)) >> (call (mem f)) >> (set (reg P1) (const_int 2)) >> (call (mem g)) >> >> and then you wouldn't have any debug information left for variable i. > Actually, you would, in the method I'm making up. In particular, both > of the first two lines in the top example (setting "i" and setting "P1") > would be marked as providing the value of the user variable "i". Yes, this works in this very simple case. But it doesn't when i is assigned, at different points, to the values of two separate variables, that are live and initialized much earlier in the program. Using hte method you seem to be envisioning would extend the life of the binding of variable 'i' to the life of the two other variables, ending up with two overlapping and conflicting live ranges for i, or it would have to drop one in favor of the other. You can't possibly retain correct (non-overlapping) live ranges for both unless you keep notes at the points of assignment. To make the example clear, consider: (set (reg x [x]) ???1) (set (reg y [y]) ???2) (set (reg i [i]) (reg x [x])) (set (reg P1) (reg i)) (call (mem f)) (set (reg i [i]) (reg y [y])) (call (mem g)) (set (reg P1) (reg i)) (call (mem f)) if it gets optimized to: (set (reg P1 [x, i]) ???1) (set (reg y [y, i]) ???2) (call (mem f)) (call (mem g)) (set (reg P1) (reg y)) (call (mem f)) then we lose. There's no way you can emit debug information for i based on these annotations such that, at the call to g, the value of i is correct. Even if you annotate the copy from y to P1, you still won't have it right, and, worse, you won't even be able to tell that, before the call to g, i should have held a different value. So you'll necessarily emit incorrect debug information for this case: you'll state i still holds a value at a point in which it shouldn't hold that value any more. This is worse that stating you don't know what the value of i is. > What I'm suggesting is that this is something akin to a dataflow > problem. We start by marking user variables, in the original TREE > representation. Then, any time we copy the value of a user variable, we > know that what we're doing is providing another place where we can find > the value of that user variable. Then, when generating debug > information, for every program region, we can find the location(s) where > the value of the user variable is available, and we can output any one > of those locations for the debugger. That's exactly what I have in mind. > This method gives us accurate debug information, in the sense that if we > say that the value of V is at location X, then it is in fact there, and > the value there is a value assigned to V. It does not necessarily give > us complete information, though, in that there may be times when the > value is somewhere and we don't realize it. Like, if: > x = y + 3; > f(x); > is optimized to: > f(y + 3) > Then, right before the call to "f", we might not know that the value of > "x" is available, or we might say that "x" has a previous value. It's not just previous value. It can be arbitrarily wrong value too. Consider again the conditional case: foo (int x, int y, int z) { int c = z; whatever0(c); c = x; whatever1(); if (some_condition) { whatever2(); c = y; whatever3(); } whatever4(c); } In the tree representation, the assignments to c just go away, in favor of a PHI node that takes x from the !some_condition block and y from the some_condition block. So, you could recover the correct value for c at the PHI node, but since the other assignments are all dropped, you can at best figure out that you don't know the value held by c between whatever1() and the PHI node, and at worst claim that it's z or x or y, or even both x and y, depending on how you update the notes. > method I've proposed will say that the value is unavailable [when > it's a constant and the assignment is optimized away] I don't see how, unless you keep a note saying at least that the variable was modified to an unknown value at that point. > I don't see that as an unreasonable limitation when debugging > optimized code, but that's open for debate. If it did that reliably, then it would be a reasonable limitation, indeed, for it would be accurate, even if incomplete. It would no longer be a correctness issue, just a quality of implementation issue. But then, I'm yet to understand how you'd generate debug info to note that the value is unavailable if you don't keep notes around to indicate the point of the assignment that was optimized away. -- Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/ FSF Latin America Board Member http://www.fsfla.org/ Red Hat Compiler Engineer [EMAIL PROTECTED], gcc.gnu.org} Free Software Evangelist [EMAIL PROTECTED], gnu.org}