On Dec 18, 2007, Ian Lance Taylor <[EMAIL PROTECTED]> wrote: > Alexandre Oliva <[EMAIL PROTECTED]> writes: >> A plan to fix local variable debug information in GCC >> >> by Alexandre Oliva <[EMAIL PROTECTED]> >> >> 2007-12-18 draft
> Thank you for writing this. It makes an enormous difference. NP. Thanks for the encouragement. >> == Goals > I note that you don't say anything about the other big problem with > debugging optimized code, which is that the debugger jumps around all > over the place. Yep, it's a separate project, that I'm somewhat interested in, and maybe somewhat easy to fix with judicious use of is_stmt notes, but it's not my top priority ATM. >> Once this is established, a possible representation becomes almost >> obvious: statements (in trees) or instructions (in rtl) that assert, >> to the variable tracker, that a user variable or member is represented >> by a given expression: >> >> # DEBUG var expr >> >> By var, we mean a tree expression that denotes a user variable, for >> now. We envision trivially extending it to support components of >> variables in the future. > While you say that this is almost obvious, it still isn't obvious at > all to me. You consider trees and RTL together, but I don't see why > that is appropriate. You snipped (skipped?) one aspect of the reasoning on why it is appropriate. Of course this doesn't prove it's the best possibility, but I haven't seen evidence of why it isn't. > My biggest concern at the tree level is the significantly increased > memory usage One of the first measurements we had from my code was from Richi, who said it didn't increase it too much. > and the introduction of a sort of a weak pointer to > values. Since DEBUG statements shouldn't interfere with > optimizations, we need to explicitly ignore them in things like > has_single_use. That's probably the easiest part, and it's already done. > But since our data structures need to be coherent, we can not ignore > them when we actually eliminate SSA names. That seems sort of > complicated. It's not. The code to do this is ready. After I got bootstrap-debug to pass on x86_64-linux-gnu, I don't recall needing any further changes in the tree passes for i386-linux-gnu, and none of the ia64-linux-gnu or ppc64-linux-gnu fixes I've made so far (most to their machine-dependent schedulers) required changes in the tree passes either. So, we can safely count that as easy and maintainable. Looking at the patches in the vta branch for the tree infrastructure will give you a very good idea of the involved effort. > In SSA form it seems very natural to provide a set of associations > with user variables for each GIMPLE variable. Yes. This provides for a simple AND WRONG representation (but not hopeless, see below, after the sample code). We went through some of this already. You can't recover the information with something that throws away information about the point of assignment. Even the basic block of assignment is lost. You can't generate correct debug information with this. The limitation of approaches like this is addressed in passing in the examples, but I didn't want to carry discussions about broken designs that I thought we'd already left behind into the concise design document. > Since the GIMPLE variables never change, these associations never > change. We have to get them right when we create a new GIMPLE > variable and when we eliminate a GIMPLE variable. Maybe you can show us how to represent the annotations for the two trivial examples I've chosen in the paper, to show that the compiler can stand a chance of generating correct debug information. > Of course this means that we are keeping the debug information in a > reversed form. This is not such a big deal; it would just lose some in completeness, and it would probably carry around lots of useless notes. The real problem is that it loses essential information for correct debug information generation. > Instead of saying that a user variable is associated with an > expression in terms of GIMPLE variables, we will say that a GIMPLE > variable is associated with an expression in terms of user > variables. Let me see if I understand what you have in mind. Given: int f(int x, int y) { int i, j; probe1(); i = x; j = y; probe2(); if (x < y) i += y; else j -= x; probe3(); return g (i ,j); } we'd SSAify it into something like: int f(int x, int y) { int i; int j; int T; probe1(); i_0 = x_1(D); /* i */ j_2 = y_3(D); /* j */ probe2(); if (x_1(D) < y_3(D)) i_4 = i_0 + y_3(D); /* i */ else j_5 = j_2 - x_1(D); /* j */ i_6 = PHI <i_4(bb_then), i_0(bb_else)> /* i */ j_7 = PHI <j_2(bb_then), j_5(bb_else)> /* j */ probe3(); T_8 = g (i_6, j_7); return T_8; } And I can see that setting breakpoints at the probe points would get you correct values for i and j. In fact, these annotations, so far, are no different from what we already have today. But then, if we optimize this just a little bit, I can't quite tell what we'd get to enable correct debug information: int f(int x, int y) { int i; int j; int T; probe1(); /* p1: ??? i, j */ probe2(); if (x_1(D) < y_3(D)) i_4 = x_1(D) + y_3(D); /* i */ else j_5 = y_3(D) - x_1(D); /* j */ i_6 = PHI <i_4(bb_then), x_1(D)(bb_else)> /* i */ j_7 = PHI <y_3(D)(bb_then), j_5(bb_else)> /* j */ probe3(); T_8 = g (i_6, j_7); return T_7; } Now, if you tell me that information about i_0 and j_2 is backward-propagated to the top of the function, where x and y are set up, I introduce say zero-initialization for i and j before probe1() (an actual function call, mind you), and then this representation is provably broken. And, if you tell me that you just discard that information, then at probe2() the variables will appear to be uninitialized (or zero-initialized after the change), and again the representation is wrong. If you tell me that you keep notes at those points to tell debug information that at probe2() both variables have unknown values, then you may get correct debug information, but you're willfully making it incomplete for an extremely common scenario (this example is intentionally made similar to a scenario after one pass of inlining into f, where i and j were former arguments to the inlined function). If you tell me that you keep notes at that point that indicate the expected values of i and j, then you've reached the representation I propose. If you tell me you keep different notes between probe1() and probe2(), that just tell the point at which i and j receive the values of x and y, but the annotations are still attached to the SSA assignment, then this stands a chance of generating correct debug information. Something like: x_1(D) /* x starting at entry point, and also i starting at p1 */ y_3(D) /* y starting at entry point, and also j starting at p1 */ Maybe these annotations interspersed in the code might be easier to handle. I hadn't considered this before. It's worth investigating. But I still haven't got your proposal entirely clear. I don't quite see how this would handle transformations other than trivial substitutions. Can you perhaps give examples of how you'd get from trivial annotations to more complex, potentially ambiguous expressions, as optimization passes make complex transformations? Maybe what you have in mind is something along the lines of induction variables, that loop optimizers would have to annotate explicitly, is that so? > It is of course true that optimized code will move around > unpredictably, and your proposal doesn't handle that. It handles that in that a variable will be regarded as being assigned to a value when execution crosses the debug stmt/insn originally inserted right after the assignment. This is by design, but I realize now I forgot to mention this in the design document. The idea is that, debug insns get high priority in scheduling. However, since they mention the assignment just before them, if the assignment is just moved earlier, without an intervening scheduling barrier, then the debug instruction will follow it. If the assignment is removed, then the debug insn can be legitimately be move up to the point where the assignment, if remaining, might have been moved up to. However, if the assignment is moved to a separate basic block, say out of a loop or a conditional, then we don't want the debug insn to move with it: such that hoisting and commonizing are regarded as setting temporaries, and the value is only "committed" to the variable if we get to the point where the assignment would take place. Neat, eh? I'll add something to this effect to the design document. > I don't see it as a flaw that it will be possible to view user > variables outside of their source code range. Agreed. Extending the range of a (variable value) binding to a point in which the variable wouldn't exist (yet or any more) without optimization is fine, but extending the range of such a binding across an assignment, even an optimized-away one, isn't. > It's not obvious to me why a DEBUG insn is superior to a REG_NOTE > attacked to an insn. Mainly because we won't want to always move the note along with the insn. A REG_NOTE isn't unambiguous for parallel sets, but there are ways around that. As written in the document, combining the debug annotation with an assignment is doable and not discarded from the plan, but at some point the note may need to be detached, and then it's not clear to me that the potential memory savings of this combination are worth the additional maintenance burden of splitting them out on demand, which is my greatest concern. On top of that, after splitting, all the maintenance burden (no matter how small) of dealing with stand-alone debug annotations would have to be undertaken anyway, so it appears to me that the combination would just add complexity. But then again, I'm not sure about it, so I haven't ruled it out; the design is open to it. > The problem with DEBUG insns is of course that the RTL code > is very sensitive to new insns, and also the additional memory usage. > You discuss those, but it's not obvious to me why your proposed > solution is the best one. I can't assert it's the best, no matter how hard I've worked on this design. I've presented my thoughts (or at least as many of them as I could remember; I may have forgotten some along the way ;-), and I've shown why other designs presented before didn't solve the problem I had to solve, as far as I could tell. Your annotations along with the point-marking notes are an approach I hadn't considered before, and I'm pretty sure I don't quite follow how this would work to the fullest extect, but on first sight it appears to me that it might work. So let's look further into it. >> Testing for accuracy and completeness of debug information can be best >> accomplished using a debugging environment. > Of course this is very unsatisfactory without an automated testsuite. Err... I didn't say the testing through a debugging environment wouldn't be automated. My plan is to use something along the lines of the GDB testsuite scripts, but whether to use GDB or some other debugging or monitoring infrastructure is a tiny implementation detail that I haven't worried about at all. The basic idea is to script the inspection of variables and verify that the obtained values are the expected ones, or that variables are defensibly unavailable at the inspection points. -- Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/ FSF Latin America Board Member http://www.fsfla.org/ Red Hat Compiler Engineer [EMAIL PROTECTED], gcc.gnu.org} Free Software Evangelist [EMAIL PROTECTED], gnu.org}