Re: Designs for better debug info in GCC

Alexandre Oliva Fri, 23 Nov 2007 15:42:18 -0800

On Nov 13, 2007, Mark Mitchell <[EMAIL PROTECTED]> wrote:

> Alexandre Oliva wrote:
>>> What I don't understand is how it's actually going to work.  What
>>> are the notes you're inserting?
>> 
>> They're always of the form
>> 
>> DEBUG user-variable = expression


> Good, I understand that now.

> Why is this better than associating user variables with assignments?

I've already explained that, but let me try to sum it up again.

If we annotate assignments, then not only do the annotations move
around along with assignments (I don't think that's desirable), but
when we optimize such assignments away, the annotations are either
dropped or have to stand on their own.

Since dropping annotations and moving them around are precisely
opposed the goal of making debug information accurate, then keeping
the annotations in place and enabling them to stand on their own is
the right thing to do.

Now, since we have to enable them to stand on their own, then we're
faced with the following decision: either we make that the canonical
annotation representation all the way from the beginning, or we
piggyback the annotations on assignments until they're moved or
removed, at which point they become stand-alone annotations.  The
former seems much more maintainable and simpler to deal with, and I
don't see that there's a significant memory or performance penalty to
this.

>> That said, growing SET to add to it a list of variables (or components
>> thereof) that the variable is assigned to could be made to work, to
>> some extent.  But when you optimize away such a set, you'd still have
>> to keep the note around

> Why?  It seems to me that if we're no longer doing the assignment, then
> the location where the value of the user variable can be found (if any)
> is not changing at this point.

The thing is that the *location* of the user variable is changing at
that point.  Either because its previous value was unavalable, or
because it had remained only at a different location.  Only at the
point of the assignment should we associate the variable with the
location that holds its current value.

>> (set (reg i) (const_int 3)) ;; assigns to i
>> (set (reg P1) (reg i))
>> (call (mem f))
>> (set (reg i) (const_int 7)) ;; assigns to i
>> (set (reg i) (const_int 2)) ;; assigns to i
>> (set (reg P1) (reg i))
>> (call (mem g))
>> 
>> could have been optimized to:
>> 
>> (set (reg P1) (const_int 3))
>> (call (mem f))
>> (set (reg P1) (const_int 2))
>> (call (mem g))
>> 
>> and then you wouldn't have any debug information left for variable i.

> Actually, you would, in the method I'm making up.  In particular, both
> of the first two lines in the top example (setting "i" and setting "P1")
> would be marked as providing the value of the user variable "i".

Yes, this works in this very simple case.  But it doesn't when i is
assigned, at different points, to the values of two separate
variables, that are live and initialized much earlier in the program.
Using hte method you seem to be envisioning would extend the life of
the binding of variable 'i' to the life of the two other variables,
ending up with two overlapping and conflicting live ranges for i, or
it would have to drop one in favor of the other.  You can't possibly
retain correct (non-overlapping) live ranges for both unless you keep
notes at the points of assignment.

To make the example clear, consider:

(set (reg x [x]) ???1)
(set (reg y [y]) ???2)
(set (reg i [i]) (reg x [x]))
(set (reg P1) (reg i))
(call (mem f))
(set (reg i [i]) (reg y [y]))
(call (mem g))
(set (reg P1) (reg i))
(call (mem f))

if it gets optimized to:

(set (reg P1 [x, i]) ???1)
(set (reg y [y, i]) ???2)
(call (mem f))
(call (mem g))
(set (reg P1) (reg y))
(call (mem f))

then we lose.  There's no way you can emit debug information for i
based on these annotations such that, at the call to g, the value of i
is correct.  Even if you annotate the copy from y to P1, you still
won't have it right, and, worse, you won't even be able to tell that,
before the call to g, i should have held a different value.  So you'll
necessarily emit incorrect debug information for this case: you'll
state i still holds a value at a point in which it shouldn't hold that
value any more.  This is worse that stating you don't know what the
value of i is.

> What I'm suggesting is that this is something akin to a dataflow
> problem.  We start by marking user variables, in the original TREE
> representation.  Then, any time we copy the value of a user variable, we
> know that what we're doing is providing another place where we can find
> the value of that user variable.  Then, when generating debug
> information, for every program region, we can find the location(s) where
> the value of the user variable is available, and we can output any one
> of those locations for the debugger.

That's exactly what I have in mind.

> This method gives us accurate debug information, in the sense that if we
> say that the value of V is at location X, then it is in fact there, and
> the value there is a value assigned to V.  It does not necessarily give
> us complete information, though, in that there may be times when the
> value is somewhere and we don't realize it.  Like, if:

>   x = y + 3;
>   f(x);

> is optimized to:

>   f(y + 3)

> Then, right before the call to "f", we might not know that the value of
> "x" is available, or we might say that "x" has a previous value.

It's not just previous value.  It can be arbitrarily wrong value too.
Consider again the conditional case:

foo (int x, int y, int z)
{
  int c = z;
  whatever0(c);
  c = x;
  whatever1();
  if (some_condition)
    {
      whatever2();
      c = y;
      whatever3();
    }
  whatever4(c);
}

In the tree representation, the assignments to c just go away, in
favor of a PHI node that takes x from the !some_condition block and y
from the some_condition block.

So, you could recover the correct value for c at the PHI node, but
since the other assignments are all dropped, you can at best figure
out that you don't know the value held by c between whatever1() and
the PHI node, and at worst claim that it's z or x or y, or even both x
and y, depending on how you update the notes.

> method I've proposed will say that the value is unavailable [when
> it's a constant and the assignment is optimized away]

I don't see how, unless you keep a note saying at least that the
variable was modified to an unknown value at that point.

> I don't see that as an unreasonable limitation when debugging
> optimized code, but that's open for debate.

If it did that reliably, then it would be a reasonable limitation,
indeed, for it would be accurate, even if incomplete.  It would no
longer be a correctness issue, just a quality of implementation issue.
But then, I'm yet to understand how you'd generate debug info to note
that the value is unavailable if you don't keep notes around to
indicate the point of the assignment that was optimized away.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   [EMAIL PROTECTED], gcc.gnu.org}
Free Software Evangelist  [EMAIL PROTECTED], gnu.org}

Re: Designs for better debug info in GCC

Reply via email to