On Mar 21, 2009, Andrew MacLeod <amacl...@redhat.com> wrote: > I was planning to send this note next week, but I guess I should know > better than to put something on the wiki and think it won't be > noticed, even if it is in a crusty dark corner :-)
:-) > I see on #gcc that my debuglocus document has been discovered. It > describes the work Aldy and I are experimenting with. It can be found > on > http://gcc.gnu.org/wiki/AndrewMacLeod > I've been trying to find alternative implementations or approaches to > Alex's VTA work to see if its possible to avoid introducing a whole > new class of instructions into the IL. First of all, I'd like to thank you for putting so much thought into this project. Regardless of whether this approach proves to be workable (you know I have several concerns about it), a number of very interesting ideas came out of it, that can certainly find their way into whatever improved debug info infrastructure GCC adopts in the end. More on my concerns below. I also thank you for having paid close attention to a number of issues I pointed out ever since you first got involved in this project. It is easy to see how some of them have found their way into the published Debuglocus proposal, and even in the earlier drafts you shared with me long ago. I can see that our interactions have helped improve and clarify some passages I'd had misunderstood in my first readings. As you're well aware, my primary concern with this staged approach is that the most difficult problems, for which we don't have solutions thought through even for the known issues in them, are relegated to later stages. AFAICT there's a significant risk that we might end up with a partial implementation that doesn't quite solve the problem, because, after putting some effort into the first stages, we won't want to give it up, even if we find unworkable problems in the latter stages. If, for this reason, we end up settling for an incomplete solution, we might end up with something equivalent to the approach taken in the var-mappings branch, which you called stage 1: both are known to not be enough to solve the problem. Since there are open and hard problems beginning in stage 2, I'd rather see a workable roadmap early, rather than fear a dead end down the road. That's why, in my design, I faced the full complexity upfront, and I had designed solutions for every part of the problem before I started implementing it: when I did start, I knew I had something that would work, that would fully solve the problem it was meant to solve. Now, I have a pretty good idea of the complexity we're facing here. I can tag along with some of the hand-waving, as you put it, like I've done since early on. However, because I haven't noticed any progress in as much as planning to solve those hard issues I brought up, that will have to be faced down the road if the problem is to be solved in the end, I'll point them out below. Hopefully someone will be able to come up with a workable design for them (both you and me have failed to fill in these gaps so far), and give us all confidence that this is more than a “wouldn't it be so nice if it actually stood a chance of working?”. On to the concerns... === Insns with multiple outputs There's no provision whatsoever in the design for parallels with multiple sets or other implicit modifications. Parallels with more than one set are not very common in regular insns, the exception probably being divmod insns. However, parallels are the way asm statements with multiple outputs are represented, and these are quite prevalent in kernel code (systemtap, anyone?). On top of that, there's another class of modifications that are not represented with sets: pre/post-auto-increment/decrement/modification. These are part of addressing modes, they make a *lot* of difference on machines that support such addressing modes, and they are often used on user-visible pointers. Nevertheless, AFAICT they are completely left out of the design you have proposed. And then, there's vectorization, that could group together separate user-visible variables that undergo similar operations, so that they can be performed in a single instruction, even with a single set, even an explicit parallel RTL construct. Maybe you do have plans to address these issues, but given that in no case does your notation have means to choose one out of many outputs of an insn (and, inductively, of other insns, even after introducing temps in stage 5), I don't see how this could be pulled off. === Unrelated concerns in stage2 I can't tell whether stage2 is an effort to reduce the amount of debug information emitted (and maintained throughout compilation?), or to avoid the jumping back-and-forth while single-stepping optimized code. Regardless, it appears to be conflating debug-info concepts that hold no relationship whatsoever to each other, while at the same time inventing another mechanism to implement something that's already standardized in DWARF. Source line number and variable location information are two unrelated mappings in debug info. The former maps ranges of PCs to source file line number; the latter maps variable and PC range to run-time loci; and both are kind of unidirectional mappings. Say, from a PC you can locate the corresponding line number (there may be more than one, if zero-length ranges are used), and you can tell exactly where each variable is (there may be multiple locations), but there may be multiple non-contiguous ranges that map to the same line number and locations. There may even be overlapping regions in location lists. There's no correspondence between line numbers and variable loci, and I don't quite see the point of trying to get them to interfere with each other. I don't see that a project to choose "special" line number changes fits in a project to generate accurate variable location information. You don't even need any line numbers whatsoever in order for variable locations to be useful. Furthermore, line number changes in DWARF have a boolean flag named is_stmt, and single-stepping and breakpoints are supposed to only pay attention to line number changes that carry that flag set. Now, having certain instructions marked as beginning of statements doesn't mean you don't want line number information for instructions that aren't. Say, if you're single-stepping instructions, rather than source lines, you can use help relating instructions with their source line, and being able to inspect variables, very much like you want precise saved-register information at call sites to inspect state at callers, even if the instruction after the call is not the beginning of a statement. Besides, if you're using hardware watchpoints to monitor some variable or memory location, be it in the debugger or in some automated monitor that is part of a larger system, you'd still want accurate values for variables at the point the debugger stopped, rather than having to wait for the next "relevant" instruction for the view to catch up. Conditional watchpoints that might be referencing outdated variables, along with the inability to tell whether the view is up-to-date at the instruction at which the program stopped, make me think this effort is misdirected. IMHO, binding points should model, as closely as possible, the variable-changing "side effects" specified in the language's virtual machine, i.e., the points where users would expect variables to be bound to values in an unoptimized version of the program. === Binding point ordering I'm not convinced that numbering is a sufficient, or even workable approach to reconstruct variable value changes as they'd have been observed in an unoptimized program. It's not just because, in the presence of temps and multiple annotations in the same insn, you can't quite tell whether an annotation is to take effect before or after that insn. There are far more interesting issues related with loops. Consider: for (a = 0; ; x = n) { b = before(x, a); n = next(x); if (stop(n, x, b, a)) break; a = after(x, b); } before, next, stop and after are suitably-defined inlinable functions. If my reading is correct, after some machinations in the compiler we'd end up with something like this: <bb 2>: a_1 = 0; # debug a ORDER 1 <bb 3>: x_2 = PHI <x_0(D) (2), x_8 (4)>; a_3 = PHI <a_1 (2), a_7 (4)>; b_4 = before(x_2, a_3); # debug b ORDER 2 n_5 = next(x_2); # debug n ORDER 3 T_6 = stop(n_5, x_2, b_4, a_3); if (T_6 != 0) goto <bb 5>; else goto <bb 4>; <bb 4>: a_7 = after(x_2, b_4); # debug a ORDER 4 x_8 = n_5; # debug x ORDER 5 goto <bb 3>; <bb 5>: So far so good. Now let's rock the boat a bit and optimize it: <bb 2>: <bb 3>: x_2 = PHI <x_0(D) (2), n_5 (4)>; a_3 = PHI <0 (2), a_7 (4)>; b_4 = before(x_2, a_3); # debug b ORDER 2, a = 0 ORDER 1 n_5 = next(x_2); # debug n ORDER 3, x ORDER 5 T_6 = stop(n_5, x_2, b_4, a_3); if (T_6 != 0) goto <bb 5>; else goto <bb 4>; <bb 4>: a_7 = after(x_2, b_4); # debug a ORDER 4 goto <bb 3>; <bb 5>: See, we substituted a_1 and x_8 for their equivalences. But now it looks like the assignment to a (ORDER 1) is within the loop. Where else could we place the annotation, and how could we tell whether or not it's in the loop? Given this ambiguity, one might as well conclude that the assignment to x might have been *after* the loop, rather than inside it. How could one tell? Well, at least there still appears to be some sequential ordering that makes sense. But what if we were to apply a common loop optimization: duplicating the "header", to turn the branch in bb4 into a fall-through edge, so that the entry test is not part of the loop, and the exit test is at the end of the body? In this case, this would involve unrolling part of the body, just because the exit test is not in the condition of the for statement. It's not unusual: if we had an inlined function call in the loop condition, we'd observe a very similar issue. This one just makes the problem more visible without adding too much complexity. Here's what we'd get: b_4 = before(x_0(D), 0); # debug b ORDER 2, a = 0 ORDER 1 n_5 = next(x_0(D)); # debug n ORDER 3, x ORDER 5 T_6 = stop(n_5, x_0(D), b_4, 0); if (T_6 != 0) goto <bb 5>; else goto <bb 4>; <bb 4>: x_2 = PHI <x_0(D) (2), n_12 (4)>; b_9 = PHI <b_4 (2), b_10 (4)>; n_11 = PHI <n_5 (2), n_12 (4)>; a_7 = after(x_2, b_9); # debug a ORDER 4 --- b_10 = before(n_11, a_3); # debug b ORDER 2, a = 0 ORDER 1 n_12 = next(n_11); # debug n ORDER 3, x ORDER 5 T_13 = stop(n_12, n_11, b_10, a_7); if (T_13 != 0) goto <bb 5>; else goto <bb 4>; <bb 5>: Now, some questions for which we'd need answers in order to be able to output proper debug information for this loop: - There are now two statements with ORDER 1, 2, 3 and 5 each, but only one 4, and this one is inside the loop. How do you figure out how to tell and arrange the duplicates so as to be able to emit proper debug info? - Even if you could somehow tell apart the duplicates and focus only on those that are within the loop, how would you figure out that 4 and 5 are to come *before* 2 and 3 there, in spite of their numbers, i.e., that the assignment to x ORDER 5 is to take the value of n_12 only in the next iteration, rather than in the same iteration, before we even call next(n_11)? - If the loop body were to be actually unrolled, then the 'x ORDER 5' would actually refer to x in another copy of the loop body. How would you deal with that? - If before, after, next and stop were inlined, the SSA names for their incoming arguments within the loop would have all been eliminated in favor of the variables in the caller. You'd also get inter-iteration notes for variables initialized to x, just like you get them for x. How do you ensure such variables get the correct values? === Reordering debug temps Consider this piece of code: dbg = (a + b) + (c + d); foo (a + b, c + d); #if DEBUG dump (dbg); #endif compiled with -UDEBUG and optimized, this becomes: T.1 = a + b; # debug tmp1 ORDER 2 T.2 = c + d; # debug tmp2 ORDER 3, dbg = tmp1 + tmp2 ORDER 1 (void) foo (T.1, T.2); But what if the assignments to T.1 and T.2 get reordered by the scheduler? (set (reg T2) (plus (reg C) (mem (...) [D]))) # debug tmp2 ORDER 3, dbg = tmp1 + tmp2 ORDER 1 (set (reg T1) (plus (reg A) (reg B))) # debug tmp1 ORDER 2 What does the note that sets dbg mean? How do you tell it's the tmp1 that's only going to be bound/computed in the next insn, rather than some previously-computed tmp1 (say, one in a previous loop iteration)? Would you have to look at the notes when moving insns about to ensure the temp notes don't become nonsensical or change meaning? === Maintenance burden Using a hook on unlinking to tell whether to move elsewhere the notes attached to an insn, a tuple or an expression is not as simple as it might sound. Or, rather, it is that simple, the problem is that it is *too* simple: it's just not enough. We might want to start with tree folding, which might simplify expressions with debug loci to constants, or to other expressions that have their own loci. Where would we then tack on the notes that are about to be thrown away, without as much as an explicit unlinking operation? Furthermore, unlinking appears in various different kinds of operations: - moving insns without re-emitting: unlink, re-insert - moving insns with re-emitting: emit pattern, unlink original - removing insns: unlink - duplicating insns into multiple blocks: emit, emit, unlink All of these cases are present in GCC, but if you just hook onto unlink, you might end up creating temps and attaching notes elsewhere, assuming insns are going away, when not necessary. And, in the case of moving insns about and using temps, you actually need to know where you're moving the pattern to, to be able to compute temps properly. At which point I begin to doubt the wisdom of going through so much trouble to avoid some small changes in optimizers. It seems to me like this is not avoiding them at all, it's actually requiring further care on the optimizers rather than the mostly-brainless changes that VTA required so far === More? Depending on the proposed solutions for the problems above, I foresee further complications, but I'll detail them only once the need arises. This is long enough as is. Regards, -- Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer