On Jun 6, 2009, Eric Botcazou <ebotca...@adacore.com> wrote: >> So if I understand the above right then VTA is a new source of >> code-generation differences with -g vs. -g0. A possibly quite >> bad one (compared to what we have now).
> IIUC it's a paradigm shift: currently the absence of differences in the > generated code is guaranteed by the absence of differences in the IR all the > way from the initial GENERIC down to the final RTL. In other words, unless a > pass makes an active mistake, it preserves the invariant. It would be nice if it worked this way, but the dozens of patches to fix -g/-g0 compile differences I posted over the last several months show it's really not that simple, because the codegen IR does not tell the whole story. We have kind of IR extensions for debug info, for types and templates, for aliasing information, even for GC of internal data structures, and all of these do affect codegen, sometimes in very subtle ways. It would be nice if things were as you describe above, I agree, but that's not where we are, and in ways other than the ones I mentioned above. Speaking specifically of debug information, the little attention given to preserving information needed to generate correct debug info means that introducing errors is not just a matter of active mistakes. Most of the debug information errors we have now do not follow from actively breaking debug info data structures, but rather from passively failing to keep them up to date, or even missing data structures to that end. Now, once we realize we need additional data structures to retain a correct mapping between source-level constructs and the result of transformations that occur throughout compilation, it shouldn't be hard to realize that there are two options: 1. maintain those IR data structures regardless of whether we're emitting debug information, spending computation time and memory to keep computation identical, so as to avoid risk, and in the end discard the results the user didn't ask for; or 2. avoid the unnecessary computation and memory use by accepting that there are going to be IR differences between compilations with or without -g, and work torwards minimizing the risks of such differences. I can certainly understand the wish to keep debug info IR out of sight, and have it all be maintained sort of by magic, without need for developers to even think about it. While I share that wish and even tried to satisfy it in the design, I've come to the conclusion that it can't be done. And it's not just a “it can't be done without major surgery in GCC as it is today”, it's a “it can't be done at all”. Let me share with you the example by which I proved it to myself. Consider an IR API that offers interfaces to remove and add operations to sequences. If you want to move an operation, you remove it from its original position and add it to another. Problem is, the moment you remove it, any debug info monitor running behind the scenes has to behave as if the operation would no longer be seen, making any changes to the debug info IR so as to minimize the absence of that operation, and not keeping any references to it. Then, when that operation is re-added, debug info monitor must deal with it as a new operation, so it can't fully recover from whatever loss of debug info the removal caused. The loss can be even greater if the operation, rather than being just moved, is re-created, without concern for debug information. Think removing an insn and creating a new insn out of its pattern, without preserving the debug info locators. Would you consider this kind of transformation an active mistake, or failure to play by the rules? Even if the API is extended so as to move operations without loss of debug info, and all existing pairs of remove/add that could/should be implemented in terms of this new interface, new code could still be added that used remove and add rather than move. This would generate exactly the same executable code, but it would prevent debug information from being preserved. Would you qualify the addition of new such code as active mistakes, or failure to play by the rules? After pondering about this, do you agree that paying attention to debug information concerns is not only something we already do routinely (just not enough), it is something that can't really be helped? If so, the question turns into how much computation you're willing to perform and baggage you're willing to carry to reduce the risk of errors caused by deviations from the rules. Apparently most GCC developers don't mind carrying around the source locator information in INSNs, EXPRs and STMTs, even though the lack of care for checking their correctness has led to very many errors that went undetected over the years. Andrew Macleod and Aldy Hernandez have been giving these issues a lot more attention than I have, and they can probably say more about how deep the hole created by all these years of erosion got. Apparently most GCC developers don't mind carrying around the original declaration information in attributes of REGs and MEMs, used exclusively for debug information. AFAICT, for codegen, alias set numbers in MEMs would suffice, but it takes actual effort to maintain the attributes during some transformations, and although there are routines that simplify this, nothing stops people from using the “old way” of say creating MEMs with different offsets or so, and new such occurrences would show up every now and then shortly after the attrs were added. However, there is a clear interest in reducing memory use and compile time, and avoiding needless computation for debug info when no debug info is wanted is one of the points of high concern during the discussions held about better debug info over the last 2 years or so. The design I proposed enables this reduction, but it can certainly work in the wasteful way we've approached this issue so far. And then, if we look into the risks of errors arising from the current stance and the one of VTA, it appears to me that the current stance favors codegen correctness without any concern for debug info correctness, whereas the VTA approach introduces some risk of generation of different but equally correct code, with much greater likelihood of correct debug info. I write different but equally correct because the presence of debug stmts/insns can at most be an inhibitor to optimizations that don't disregard them. I'd be the last person to dismiss the requirement of generating the same executable code regardless of debug info options, but considering the reasoning below, I believe the risk is acceptable: - the availability of -fcompare-debug, and its regular use as part of the development process, will reduce by far the likelihood of running into this kind of problem - if you used VTA during the development phase and find that compilation without debug info breaks your program, -fcompare-debug will confirm the diagnosis and then you can compile with VTA debug info and strip it off afterwards - if you're investigating an error in a program originally compiled without debug info and you can't duplicate it with VTA enabled, you can confirm the diagnosis with -fcompare-debug and then refrain from enabling VTA. You'll then be no worse off than we are today, for you'll then still be able to use the same debug info we generate today. Considering how many latent -g/-g0 errors I've fixed myself because of the introduction of machinery to detect them, and how many new ones were introduced since I started monitoring them, I know the current design doesn't offer the guarantees you seem to have been counting on. This is obviously no excuse to go wild, counting on a safety net to keep things right. But the proposal on the table is certainly not a wild one ;-) -- Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer