On Dec 5, 2007, Diego Novillo <[EMAIL PROTECTED]> wrote: > On 11/25/07 3:43 PM, Mark Mitchell wrote:
>> My suggestion (not as a GCC SC member or GCC RM, but just as a fellow >> GCC developer with an interest in improving the compiler in the same way >> that you're trying to do) is that you stop writing code and start >> writing a paper about what you're trying to do. >> >> Ignore the implementation. Describe the problem in detail. Narrow its >> scope if necessary. Describe the success criteria in detail. Ideally, >> the success criteria are mechanically checkable properties: i.e., given >> a C program as input, and optimized code + debug information as output, >> it should be possible to algorithmically prove whether the output is >> correct. > Yes, please. I would very much like to see an abstract design > document on what you are trying to accomplish. Other than the ones I've already posted, here's one: http://dwarfstd.org/Dwarf3Std.php Seriously. There is a standard for this stuff. My ultimate goal in this project is that we comply with it, at least as far as emitting debug information for location of variables is concerned. Here are some relevant postings on design strategies, rationales and goals: http://gcc.gnu.org/ml/gcc/2007-11/msg00229.html (goals) http://gcc.gnu.org/ml/gcc-patches/2007-10/msg00160.html (initial plan) http://gcc.gnu.org/ml/gcc/2007-11/msg00261.html (detailed plan) http://gcc.gnu.org/ml/gcc/2007-11/msg00317.html (example) http://gcc.gnu.org/ml/gcc/2007-11/msg00590.html (more example) http://gcc.gnu.org/ml/gcc/2007-11/msg00176.html (design rationale) http://gcc.gnu.org/ml/gcc/2007-11/msg00177.html (clarification) > I would like to see exactly what Mark is asking for. Perhaps a > presentation in next year's Summit? Sure, if there's interest, I could sure plan on doing that. I could use sponsors, BTW; I haven't discussed this with my employer, and writing articles and presenting speeches are not part of this assignment I was given. Anyhow, by the time of the next year's Summit, I hope this is mostly old news. > I don't think I understand the goal of the project. Follow the standard, as in (1) emit debug information that is correct (standard-compliant), as in, if we emit some piece of debug information, it reflects reality, rather than being a sometimes distant approximation of some past reality long destroyed by some optimization pass, and (2) emit debug information that is more complete, as in, we currently fail to emit a lot of debug information that we could, because we lose track of the location of variables as optimization passes fail to maintain the needed information to do so. > "Correct debugging info" means little, particularly if you say that > it's not debuggers that you are thinking about. Thinking of the debuggers is a mistake. We don't think of specific compilers when reading a programming language standard. We don't think of specific processors when reading an ISA or ABI specification. Even when we read documentation specific to a processor, we still don't think of its internal implementation details in order to write a compiler for it; even the scheduling properties are abstracted out in the design specification and optimization guidelines. When someone finds that the compiler deviates from one of these standards, we just cite chapter and verse of the relevant standard, and people see there's a bug. Why should debug information standards be treated any differently? > It's certainly worrisome that your implementation seems to be > intrusive to the point of brittleness. What part of instrusiveness are you concerned about? The change of INSN_P such that it covers DEBUG_INSN_P too in the supported range? Or the few changes that revert to the original INSN_P, in the few exceptions in which DEBUG_INSN_P is not to be handled as an INSN? I've heard this "intrusiveness" argument be pointed out so many times, by so many people that claim to not have been able to keep up with the thread, and who claim to have not looked at the patches at all, that I'm more and more convinced it's just fear of the unknown than any actual rational evaluation of the impact of the changes. Seriously. Have a look at the patches and tell me what in them you regard as intrusive. We're talking about infrastructure here, needed to fix GCC's carelessness about maintaining a mapping between source and implementation concepts that went on for years and years, while optimizations were added and debug information was degraded. At some point you have to face reality and see that such information isn't kept around by magic, it takes some effort, and this effort is needed at every location where there are changes that might affect debug information. And that's pretty much everywhere. Even if we had consistent interfaces to make some changes, such as variable renaming, substitution, etc, this would only cover a small amount of the data a debug info generator would need: it needs higher-level information than that, especially in rtl, where transformations, for historical reasons, are messier than in the tree IL. So, the approach I've taken is to use the strength of the problem against itself: take advantage of the fact that optimizers already know how to perform transformations they need to do in order to keep things consistent, and represent debug information in a way that, to them, will look just like any other use, so they will adjust it likewise. And then, on top of that, handle the few exceptions, in which the optimizer needs to do something cleverer, because the transformation it performs wouldn't work when say there's more than one use or so. > Will every new optimization need to think about debug information > from scratch and refrain from doing certain transformations? Refraining from doing certain transformations would be wrong. We don't want debug information to affect code generation, and we don't want it to reduce the amount of optimization you can make. So, you optimize away, and if you find that you can't keep track of debug information, you mark stuff as unavailable, or, most likely, the safety nets in place will do that for you, rather than taking the current approach, in which we silently corrupt debug information. Sure, this might require a little bit more thinking in some optimizations. But in my experience fixing up the tree and rtl passes that needed tweaking, the additional thinking needed is a no-brainer in most cases; in a few, you have to work a bit harder to keep information around rather than simply noting it as unavailable. But it has never required optimizations to be disabled, and it must not do so. In fact, in a few cases, I noticed we were missing trivial optimizations and fixed them. > In my simplistic view of this problem, I've always had the idea that > -O0 -g means "full debugging bliss", -O1 -g means "tolerable > debugging" (symbols shouldn't disappear, for instance, though they do > now) and -O2 -g means "you can probably know what line+function you're > executing". I've never seen this documented as such, and we've never worked toward these stated goals. However, I see that, underlying all of this, we should be concerned about emitting debug information that is correct, i.e., never emit information that says the location of FOO is BAR while it's actually at BAZ. I've seen many people (including myself, in a distant past) claiming that imprecise information is better than no information. I've learned better. Debugger information consumers are often equipped with heuristics to fill in common gaps in debug information. But if the information is there, and wrong, the heuristics that might very well have worked are disabled in favor of the incorrect information, and then the whole system (debuggers, monitors, etc, along with the program) misbehaves. And then, even when heuristics don't exist and the information is gone, it's better to tell the user "I don't know how to get you that" than to hand it something other than it needs (e.g., an incorrect variable location). > But you seem to be addressing other problems. And it even seems to me > that you want debugging information that is capable of deconstructing > arbitrary transformations done by the optimizers. No. I don't see where this notion came from, but it appears to be quite widespread. Omitting certain pieces of debug information is almost always correct, since most debug info attributes are optional. But emitting information that doesn't reflect the program is always incorrect. So, if you perform an arbitrary transformation that is too hard to represent in debug information, that's fine, just throw the information away. The debug information might become less complete, and therefore less useful, but it will at least won't induce errors elsewhere. The parallel I draw is that emitting an optional piece of debug information is like applying an optional optimization. If it's correct, and it's not too expensive, go for it. But if it's going to get you the wrong output, it's broken, so don't do it. -- Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/ FSF Latin America Board Member http://www.fsfla.org/ Red Hat Compiler Engineer [EMAIL PROTECTED], gcc.gnu.org} Free Software Evangelist [EMAIL PROTECTED], gnu.org}