On Nov 7, 2007, Ian Lance Taylor <[EMAIL PROTECTED]> wrote: >> Does it really matter? Do we compromise standards compliance (and so >> violently, while at that) in any aspect of the compiler?
> What standards are you talking about? Debug information standards such as DWARF-3. > I'm not aware of any standard for debuggability of optimized code. I'm talking about standards that specify how a compiler should encode meta-information about how source code concepts map to the code it generated. See, for example, section 2.6 in the Dwarf-3 specification. It talks very little about optimization, but it does discuss what a DW_AT_location, if present, means. It doesn't say anything like: "if a variable is available at a certain location most of the time, you can emit a DW_AT_location that refers to that location". It says: Debugging information must provide consumers a way to find the location of program variables, determine the bounds of dynamic arrays and strings, and possibly to find the base address of a subroutine’s stack frame or the return address of a subroutine See, it's not about debuggers, it's about consumers. It's an obligation, not really an option (that said, DW_AT_location *is* optional). 1. Location expressions, which are a language independent representation of addressing rules of arbitrary complexity built from DWARF expressions. They are sufficient for describing the location of any object as long as its lifetime is either static or the same as the lexical block that owns it, and it does not move throughout its lifetime. 2. Location lists, which are used to describe objects that have a limited lifetime or change their location throughout their lifetime. Nowhere does it state that, "if the compiler can't quite keep track of the location of a variable, it can be sloppy and emit just whatever is simpler or appears to make sense". Address ranges may overlap. When they do, they describe a situation in which an object exists simultaneously in more than one place. If all of the address ranges in a given location list do not collectively cover the entire range over which the object in question is defined, it is assumed that the object is not available for the portion of the range that is not covered. So, it does make room for *some* sloppiness, after all. That's what I refer to as "incompleteness of debug information". If we fail to keep track of where an object is, it's sort-of ok (although undesirable) to emit debug information that omits the location of the object in certain program regions where it might be live. However, it is not standard-compliant to emit information stating that the object is available at certain locations if it is NOT really there, or if it is available elsewhere, in addition to or instead of the locations we've emitted. That's what I refer to as "incorrectness of debug information". Incorrectness in the compiler output is always a bug. No matter how hard it is to implement, or how resource-intensive the solution is, arguing that we've made a trade-off and decided to generate wrong output for this case is a clever decision. Incompleteness is a completely different issue. This is where we *can* afford to make trade-offs. Just like we can decide to omit certain optimizations, or to not carry them out to the greatest possible extent, or to experiment with various different heuristics, we could afford to emit incomplete debug information, it's "just" a quality of implementation issue. But not incorrect debug information, that's just a bug. > gcc's users are definitely calling for a faster compiler. Are they > calling for better debuggability of optimized code? This is not just about debuggability, as I've tried to explain all the way from the beginning of the discussion, maybe a couple of months ago. Debug information is not just about debuggers any more. There are good reasons why the Dwarf-3 standard says "consumers" rather than "debuggers". It's no longer just a matter of convenience, recompile with -g0 if you want to debug it. It's a matter of correctness, for various monitoring tools now rely on this meta-information, and rightfully so. >> > We've fixed many many bugs and misoptimizations over the years due to >> > NOTEs. I'm concerned that adding DEBUG_INSN in RTL repeats a mistake >> > we've made in the past. >> >> That's a valid concern. However, per this reasoning, we might as well >> push every operand in our IL to separate representations, because >> there have been so many bugs and misoptimizations over the years, >> especially when the representation didn't make transformations >> trivially correct. > Please don't use strawman arguments. It's not, really. A reference to an object within a debug stmt or insn is very much like any other operand, in that most optimizer passes must keep them up to date. If you argue for pushing them outside the IL, why would any other operands be different? > As I understand your proposal, it materializes variables which were > otherwise omitted from the generated program. It doesn't address the > other issues with debugging optimized code, like bouncing around > between program lines. Is that correct? What else does your proposal > do? All it does is to try to carry information about what value the user is entitled to expect a variable to hold at each point in the program throughout compilation. Such that, even if the compiler doesn't retain something that represents only that variable through to the end of the compilation, we still have information about where, or at least what, its value is, if it is available anywhere, such that we can include this piece of data in the debug information. -- Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/ FSF Latin America Board Member http://www.fsfla.org/ Red Hat Compiler Engineer [EMAIL PROTECTED], gcc.gnu.org} Free Software Evangelist [EMAIL PROTECTED], gnu.org}