On Jun  6, 2009, Eric Botcazou <ebotca...@adacore.com> wrote:

>> So if I understand the above right then VTA is a new source of
>> code-generation differences with -g vs. -g0.  A possibly quite
>> bad one (compared to what we have now).

> IIUC it's a paradigm shift: currently the absence of differences in the 
> generated code is guaranteed by the absence of differences in the IR all the 
> way from the initial GENERIC down to the final RTL.  In other words, unless a 
> pass makes an active mistake, it preserves the invariant.

It would be nice if it worked this way, but the dozens of patches to fix
-g/-g0 compile differences I posted over the last several months show
it's really not that simple, because the codegen IR does not tell the
whole story.  We have kind of IR extensions for debug info, for types
and templates, for aliasing information, even for GC of internal data
structures, and all of these do affect codegen, sometimes in very subtle
ways.

It would be nice if things were as you describe above, I agree, but
that's not where we are, and in ways other than the ones I mentioned
above.

Speaking specifically of debug information, the little attention given
to preserving information needed to generate correct debug info means
that introducing errors is not just a matter of active mistakes.  Most
of the debug information errors we have now do not follow from actively
breaking debug info data structures, but rather from passively failing
to keep them up to date, or even missing data structures to that end.

Now, once we realize we need additional data structures to retain a
correct mapping between source-level constructs and the result of
transformations that occur throughout compilation, it shouldn't be hard
to realize that there are two options:

1. maintain those IR data structures regardless of whether we're
emitting debug information, spending computation time and memory to keep
computation identical, so as to avoid risk, and in the end discard the
results the user didn't ask for; or

2. avoid the unnecessary computation and memory use by accepting that
there are going to be IR differences between compilations with or
without -g, and work torwards minimizing the risks of such differences.


I can certainly understand the wish to keep debug info IR out of sight,
and have it all be maintained sort of by magic, without need for
developers to even think about it.  While I share that wish and even
tried to satisfy it in the design, I've come to the conclusion that it
can't be done.  And it's not just a “it can't be done without major
surgery in GCC as it is today”, it's a “it can't be done at all”.  Let
me share with you the example by which I proved it to myself.

Consider an IR API that offers interfaces to remove and add operations
to sequences.  If you want to move an operation, you remove it from its
original position and add it to another.  Problem is, the moment you
remove it, any debug info monitor running behind the scenes has to
behave as if the operation would no longer be seen, making any changes
to the debug info IR so as to minimize the absence of that operation,
and not keeping any references to it.  Then, when that operation is
re-added, debug info monitor must deal with it as a new operation, so it
can't fully recover from whatever loss of debug info the removal
caused.

The loss can be even greater if the operation, rather than being just
moved, is re-created, without concern for debug information.  Think
removing an insn and creating a new insn out of its pattern, without
preserving the debug info locators.

Would you consider this kind of transformation an active mistake, or
failure to play by the rules?


Even if the API is extended so as to move operations without loss of
debug info, and all existing pairs of remove/add that could/should be
implemented in terms of this new interface, new code could still be
added that used remove and add rather than move.  This would generate
exactly the same executable code, but it would prevent debug information
from being preserved.

Would you qualify the addition of new such code as active mistakes, or
failure to play by the rules?

After pondering about this, do you agree that paying attention to debug
information concerns is not only something we already do routinely (just
not enough), it is something that can't really be helped?

If so, the question turns into how much computation you're willing to
perform and baggage you're willing to carry to reduce the risk of errors
caused by deviations from the rules.

Apparently most GCC developers don't mind carrying around the source
locator information in INSNs, EXPRs and STMTs, even though the lack of
care for checking their correctness has led to very many errors that
went undetected over the years.  Andrew Macleod and Aldy Hernandez have
been giving these issues a lot more attention than I have, and they can
probably say more about how deep the hole created by all these years of
erosion got.

Apparently most GCC developers don't mind carrying around the original
declaration information in attributes of REGs and MEMs, used exclusively
for debug information.  AFAICT, for codegen, alias set numbers in MEMs
would suffice, but it takes actual effort to maintain the attributes
during some transformations, and although there are routines that
simplify this, nothing stops people from using the “old way” of say
creating MEMs with different offsets or so, and new such occurrences
would show up every now and then shortly after the attrs were added.

However, there is a clear interest in reducing memory use and compile
time, and avoiding needless computation for debug info when no debug
info is wanted is one of the points of high concern during the
discussions held about better debug info over the last 2 years or so.
The design I proposed enables this reduction, but it can certainly work
in the wasteful way we've approached this issue so far.


And then, if we look into the risks of errors arising from the current
stance and the one of VTA, it appears to me that the current stance
favors codegen correctness without any concern for debug info
correctness, whereas the VTA approach introduces some risk of generation
of different but equally correct code, with much greater likelihood of
correct debug info.

I write different but equally correct because the presence of debug
stmts/insns can at most be an inhibitor to optimizations that don't
disregard them.  I'd be the last person to dismiss the requirement of
generating the same executable code regardless of debug info options,
but considering the reasoning below, I believe the risk is acceptable:

- the availability of -fcompare-debug, and its regular use as part of
the development process, will reduce by far the likelihood of running
into this kind of problem

- if you used VTA during the development phase and find that compilation
without debug info breaks your program, -fcompare-debug will confirm the
diagnosis and then you can compile with VTA debug info and strip it off
afterwards

- if you're investigating an error in a program originally compiled
without debug info and you can't duplicate it with VTA enabled, you can
confirm the diagnosis with -fcompare-debug and then refrain from
enabling VTA.  You'll then be no worse off than we are today, for you'll
then still be able to use the same debug info we generate today.

Considering how many latent -g/-g0 errors I've fixed myself because of
the introduction of machinery to detect them, and how many new ones were
introduced since I started monitoring them, I know the current design
doesn't offer the guarantees you seem to have been counting on.

This is obviously no excuse to go wild, counting on a safety net to keep
things right.  But the proposal on the table is certainly not a wild
one ;-)

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist      Red Hat Brazil Compiler Engineer

Reply via email to