Re: Designs for better debug info in GCC

Alexandre Oliva Sat, 15 Dec 2007 13:41:54 -0800

On Dec  5, 2007, Diego Novillo <[EMAIL PROTECTED]> wrote:

> On 11/25/07 3:43 PM, Mark Mitchell wrote:


>> My suggestion (not as a GCC SC member or GCC RM, but just as a fellow
>> GCC developer with an interest in improving the compiler in the same way
>> that you're trying to do) is that you stop writing code and start
>> writing a paper about what you're trying to do.
>> 
>> Ignore the implementation.  Describe the problem in detail.  Narrow its
>> scope if necessary.  Describe the success criteria in detail.  Ideally,
>> the success criteria are mechanically checkable properties: i.e., given
>> a C program as input, and optimized code + debug information as output,
>> it should be possible to algorithmically prove whether the output is
>> correct.

> Yes, please.  I would very much like to see an abstract design
> document on what you are trying to accomplish.

Other than the ones I've already posted, here's one:

http://dwarfstd.org/Dwarf3Std.php

Seriously.  There is a standard for this stuff.  My ultimate goal in
this project is that we comply with it, at least as far as emitting
debug information for location of variables is concerned.

Here are some relevant postings on design strategies, rationales and
goals:

http://gcc.gnu.org/ml/gcc/2007-11/msg00229.html (goals)
http://gcc.gnu.org/ml/gcc-patches/2007-10/msg00160.html (initial plan)
http://gcc.gnu.org/ml/gcc/2007-11/msg00261.html (detailed plan)
http://gcc.gnu.org/ml/gcc/2007-11/msg00317.html (example)
http://gcc.gnu.org/ml/gcc/2007-11/msg00590.html (more example)
http://gcc.gnu.org/ml/gcc/2007-11/msg00176.html (design rationale)
http://gcc.gnu.org/ml/gcc/2007-11/msg00177.html (clarification)

> I would like to see exactly what Mark is asking for.  Perhaps a
> presentation in next year's Summit?

Sure, if there's interest, I could sure plan on doing that.  I could
use sponsors, BTW; I haven't discussed this with my employer, and
writing articles and presenting speeches are not part of this
assignment I was given.  Anyhow, by the time of the next year's
Summit, I hope this is mostly old news.

> I don't think I understand the goal of the project.

Follow the standard, as in (1) emit debug information that is correct
(standard-compliant), as in, if we emit some piece of debug
information, it reflects reality, rather than being a sometimes
distant approximation of some past reality long destroyed by some
optimization pass, and (2) emit debug information that is more
complete, as in, we currently fail to emit a lot of debug information
that we could, because we lose track of the location of variables as
optimization passes fail to maintain the needed information to do so.

> "Correct debugging info" means little, particularly if you say that
> it's not debuggers that you are thinking about.

Thinking of the debuggers is a mistake.  We don't think of specific
compilers when reading a programming language standard.  We don't
think of specific processors when reading an ISA or ABI specification.
Even when we read documentation specific to a processor, we still
don't think of its internal implementation details in order to write a
compiler for it; even the scheduling properties are abstracted out in
the design specification and optimization guidelines.

When someone finds that the compiler deviates from one of these
standards, we just cite chapter and verse of the relevant standard,
and people see there's a bug.

Why should debug information standards be treated any differently?

> It's certainly worrisome that your implementation seems to be
> intrusive to the point of brittleness.

What part of instrusiveness are you concerned about?  The change of
INSN_P such that it covers DEBUG_INSN_P too in the supported range?
Or the few changes that revert to the original INSN_P, in the few
exceptions in which DEBUG_INSN_P is not to be handled as an INSN?

I've heard this "intrusiveness" argument be pointed out so many times,
by so many people that claim to not have been able to keep up with the
thread, and who claim to have not looked at the patches at all, that
I'm more and more convinced it's just fear of the unknown than any
actual rational evaluation of the impact of the changes.

Seriously.  Have a look at the patches and tell me what in them you
regard as intrusive.

We're talking about infrastructure here, needed to fix GCC's
carelessness about maintaining a mapping between source and
implementation concepts that went on for years and years, while
optimizations were added and debug information was degraded.

At some point you have to face reality and see that such information
isn't kept around by magic, it takes some effort, and this effort is
needed at every location where there are changes that might affect
debug information.  And that's pretty much everywhere.  Even if we had
consistent interfaces to make some changes, such as variable renaming,
substitution, etc, this would only cover a small amount of the data a
debug info generator would need: it needs higher-level information
than that, especially in rtl, where transformations, for historical
reasons, are messier than in the tree IL.

So, the approach I've taken is to use the strength of the problem
against itself: take advantage of the fact that optimizers already
know how to perform transformations they need to do in order to keep
things consistent, and represent debug information in a way that, to
them, will look just like any other use, so they will adjust it
likewise.  And then, on top of that, handle the few exceptions, in
which the optimizer needs to do something cleverer, because the
transformation it performs wouldn't work when say there's more than
one use or so.

> Will every new optimization need to think about debug information
> from scratch and refrain from doing certain transformations?

Refraining from doing certain transformations would be wrong.  We
don't want debug information to affect code generation, and we don't
want it to reduce the amount of optimization you can make.  So, you
optimize away, and if you find that you can't keep track of debug
information, you mark stuff as unavailable, or, most likely, the
safety nets in place will do that for you, rather than taking the
current approach, in which we silently corrupt debug information.

Sure, this might require a little bit more thinking in some
optimizations.  But in my experience fixing up the tree and rtl passes
that needed tweaking, the additional thinking needed is a no-brainer
in most cases; in a few, you have to work a bit harder to keep
information around rather than simply noting it as unavailable.  But
it has never required optimizations to be disabled, and it must not do
so.  In fact, in a few cases, I noticed we were missing trivial
optimizations and fixed them.

> In my simplistic view of this problem, I've always had the idea that
> -O0 -g means "full debugging bliss", -O1 -g means "tolerable
> debugging" (symbols shouldn't disappear, for instance, though they do
> now) and -O2 -g means "you can probably know what line+function you're
> executing".

I've never seen this documented as such, and we've never worked toward
these stated goals.  However, I see that, underlying all of this, we
should be concerned about emitting debug information that is correct,
i.e., never emit information that says the location of FOO is BAR
while it's actually at BAZ.

I've seen many people (including myself, in a distant past) claiming
that imprecise information is better than no information.  I've
learned better.  Debugger information consumers are often equipped
with heuristics to fill in common gaps in debug information.

But if the information is there, and wrong, the heuristics that might
very well have worked are disabled in favor of the incorrect
information, and then the whole system (debuggers, monitors, etc,
along with the program) misbehaves.

And then, even when heuristics don't exist and the information is
gone, it's better to tell the user "I don't know how to get you that"
than to hand it something other than it needs (e.g., an incorrect
variable location).

> But you seem to be addressing other problems.  And it even seems to me
> that you want debugging information that is capable of deconstructing
> arbitrary transformations done by the optimizers.

No.  I don't see where this notion came from, but it appears to be
quite widespread.  Omitting certain pieces of debug information is
almost always correct, since most debug info attributes are optional.
But emitting information that doesn't reflect the program is always
incorrect.

So, if you perform an arbitrary transformation that is too hard to
represent in debug information, that's fine, just throw the
information away.  The debug information might become less complete,
and therefore less useful, but it will at least won't induce errors
elsewhere.

The parallel I draw is that emitting an optional piece of debug
information is like applying an optional optimization.  If it's
correct, and it's not too expensive, go for it.  But if it's going to
get you the wrong output, it's broken, so don't do it.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   [EMAIL PROTECTED], gcc.gnu.org}
Free Software Evangelist  [EMAIL PROTECTED], gnu.org}

Re: Designs for better debug info in GCC

Reply via email to