Re: Designs for better debug info in GCC

Michael Matz Thu, 08 Nov 2007 02:23:31 -0800

Hi,

On Wed, 7 Nov 2007, Alexandre Oliva wrote:


> > x and y at the appropriate part.  Whatever holds 'x' at a point (SSA 
> > name, pseudo or mem) will also mention that it holds 'c'.  At a later 
> > point whichever holds 'y' will also mention in holds 'c' .
> 
> I.e., there will be two parallel locations throughout the entire 
> function that hold the value of 'c'.

No.  For some PC locations the location of 'c' will happen to be the same 
as the one holding 'x', and for a different set of PC locations it will be 
the one also holding 'y'.  The request "what's in 'c'" from a debugger 
only makes sense when done from a certain program counter.  Depending on 
that the location of 'c' will be different.  In the case from above both 
locations might exist in parallel throughout the entire function, but they 
don't hold 'c' in parallel.

> Something like:
> 
> f(int x /* but also c */, int y /* but also c */) { /* other vars */

"int x /* but also c */, int y /* but also c */" implies that x == y 
already, at which point the compiler will most probably have allocated 
just one place for x and y (and c) anyway ...

>  do_something_with(x, ...); // doesn't touch x or y
>  do_something_else_with(y, ...); // doesn't touch x or y
> 
> Now, what will you get if you 'print c' in the debugger (or if any
> other debug info evaluator needs to tell what the value of user
> variable c is) at a point within do_something_with(c,...) or
> do_something_else_with(c)?

... so the answer would be "whatever is in that common place for x,y and 
c".  If the compiler did not allocate one place for x and y the answer 
still would be "whatever is in the place of 'y'", because that value is 
life, unlike 'x'.

> Now consider that f is inlined into the following code:
> 
> int g(point2d p) {
>   /* lots of code */
>   f(p.x, p.y);
>   /* more code */
>   f(p.y, p.x);
>   /* even more code */
> }
> 
> g gets fully scalarized, so, before inlining, we have:
> 
> int g(point2d p) {
>   int p$x = p.x, int p$y = p.y;
>   /* lots of code */
>   f(p$x, p$y);
>   /* more code */
>   f(p$y, p$x);
>   /* even more code */
> }
> 
> after inlining of f, we end up with:
> 
> int g(point2d p) {
>   int p$x = p.x, int p$y = p.y;
>   /* lots of code */
>   { int f()::x.1 /* but also f()::c.1 */ = p$x, f()::y.1 /* but also f()::c.1 
> */ = p$y;

Here you punt.  How come that f::c is actually set to p$x?  I don't see 
any assignment and in fact no declaration for c in f.  If you had one 
_that_ would be the place were the connection between p$x and 'c' would 
have been made and everything would fall in place.

>     { /* other vars */
>       do_something_with(f()::x.1, ...); // doesn't touch x or y
>       do_something_else_with(f()::y.1, ...); // doesn't touch x or y
>   } }
>   /* more code */
>   { int f()::x.2 /* but also f()::c.2 */ = p$x, f()::y.2 /* but also f()::c.2 
> */ = p$y;
>     { /* other vars */
>       do_something_with(f()::x.2, ...); // doesn't touch x or y
>       do_something_else_with(f()::y.2, ...); // doesn't touch x or y
>   } }
>   /* even more code */
> }
> 
> then, we further optimize g and get:
> 
> int g(point2d p) {
>   int p$x /* but also f()::x.1, f()::c.1, f()::y.2, f()::c.2 */ = p.x;
>   int p$y /* but also f()::y.1, f()::c.1, f()::x.2, f()::c.2 */ = p.y;
>   /* lots of code */
>   { { /* other vars */
>       do_something_with(p$x, ...); // doesn't touch x or y
>       do_something_else_with(p$y, ...); // doesn't touch x or y
>   } }
>   /* more code */
>   { { /* other vars */
>       do_something_with(p$y, ...); // doesn't touch x or y
>       do_something_else_with(p$x, ...); // doesn't touch x or y
>   } }
>   /* even more code */
> }
> 
> and now, if you try to resolve the variable name 'c' to a location or
> a value within any of the occurrences of do_something_*with(), what do
> you get?  What ranges do you generate for each of the variables
> involved?

It's not possible that p$x _and_ p$y are f()::c.1 at the same time, so the 
above examples are all somehow invalid.  Except if p$x and p$y are somehow 
the same value, and if that's the case it's enough and exactly correct if 
the range of f()::c.1 covers the whole body of your function 'g' referring 
to exactly the one location of f()::c.1, f()::c.2, p$x and p$y.

> Unfortunately, this mapping is not biunivocal.  The chosen 
> representation is fundamentally lossy.

What's fundamentally lossy are transformations done by the compiler.  E.g. 
in this simple case:

int f(int y) {
  int x = 2 * y;
  return x + 2;
}

If the compiler forward-props 2*y into the single use and simplifies:

  return (y+1)*2;

then the value 2*y is never actually calculated anymore, not in any 
register, not in any local variable, nowhere.  There's no way debug 
information could generally rectify this loss of information.  As DWARF is 
capable to encode complete expressions it would be possible in this case 
to express it, because the inverse of the above function is easily 
determined.  In case of more complicated expressions that's not possible 
anymore and you lose.

So, if the value is never ever computed anymore debug information won't 
help you.  You either have to force the value you're interested in to be 
life, or live with the impreciseness.

Forcing some values life is possible, but is independend of generating 
debug information as exact as possible.  It must be independend because 
forcing values life is going to change the code, something which mere 
generation of debug information is not allowed to do.

So, our mapping is as accurate as your's.  If a value is computed in some 
place which can be traced back to some user-declared variable then this 
will be expressed.  If the value is not available then of course it also 
can't be reflected in the debug information (only as "optimized out").  It 
seems in your branch you also force some values life IIUC.  That's okay 
but doesn't have to do with generating precise debug information as shown 
above.

Even for forcing values life there are easier mechanisms.  We for instance 
experimented with volatile asms, which simply refer to the values in 
question (and unsurprisingly we also were interested in formal arguments 
of inlined functions):

  int f (int x) {
    force_use (x);
    ... old body ...
  }

You have to switch off any propagation into force_use(x), so that the 
original value of 'x' and the connection to the DECL of 'x' lives until 
the end of the compilation pipeline.  That's a rather simple hack doing 
exactly what's necessary: it forces GCC to actually have a place for the 
value of 'x' at the function entry point, which also survives inlining.


Ciao,
Michael.

Re: Designs for better debug info in GCC

Reply via email to