On Tue, Oct 15, 2019 at 9:58 PM Luis Machado <luis.mach...@linaro.org> wrote: > > Hi, > > I'd like to get some feedback from the compiler's side before > implementing a fix for this line numbering problem. I also want to make > sure i fix it in the right tool. > > This is related to this bug report in GDB's bugzilla: > https://sourceware.org/bugzilla/show_bug.cgi?id=21221 > > It deals with the cases where we have loops with empty bodies, empty > headers (for loops) or that simply were written in a single line. This > causes GCC to not emit line transitions in one way or another. As a > consequence, GDB won't see the line transition and will continuously > attempt to step/next until it sees one. > > For the end user it appears GDB is stuck in a particular loop, with most > of them hitting ctrl-C to interrupt it. In reality GDB is making > progress in the loop, but it will only stop once it goes out of the > loop, where it will see a line transition. > > For the sake of reducing the scope of the problem, I'll assume the loops > are written across multiple lines and that we're interested in O0 > debugging. Higher optimization levels would probably reshape the loop or > reduce it to a single instruction in some cases. > > Take for example the case of BZ #21221... > > int main (void) > { > while (1) > { > 5 for (unsigned int i = 0U; i < 0xFFFFFU; i++) > 6 { > 7 ; > 8 } > } > } > > GCC generates the following code: > > 0x00000000000005fa <+0>: push %rbp > 0x00000000000005fb <+1>: mov %rsp,%rbp > 0x00000000000005fe <+4>: movl $0x0,-0x4(%rbp) > 0x0000000000000605 <+11>: jmp 0x60b <main+17> > 0x0000000000000607 <+13>: addl $0x1,-0x4(%rbp) > 0x000000000000060b <+17>: cmpl $0xffffe,-0x4(%rbp) > 0x0000000000000612 <+24>: jbe 0x607 <main+13> > 0x0000000000000614 <+26>: jmp 0x5fe <main+4> > > And the line table looks like this: > > Line Number Statements: > [0x00000047] Extended opcode 2: set Address to 0x5fa > [0x00000052] Special opcode 6: advance Address by 0 to 0x5fa and > Line by 1 to 2 > [0x00000053] Special opcode 64: advance Address by 4 to 0x5fe and > Line by 3 to 5 > [0x00000054] Extended opcode 4: set Discriminator to 3 > [0x00000058] Set is_stmt to 0 > [0x00000059] Special opcode 131: advance Address by 9 to 0x607 and > Line by 0 to 5 > [0x0000005a] Extended opcode 4: set Discriminator to 1 > [0x0000005e] Special opcode 61: advance Address by 4 to 0x60b and > Line by 0 to 5 > [0x0000005f] Special opcode 131: advance Address by 9 to 0x614 and > Line by 0 to 5 > [0x00000060] Advance PC by 2 to 0x616 > [0x00000062] Extended opcode 1: End of Sequence > > GCC doesn't generate any code or line number transitions for the empty > loop body, therefore GDB keeps cycling inside this loop, in line 5. > > Clang, on the other hand, seems to be a bit smarter about this and will > generate a dummy jump to help the debugger. > > Here's Clang's code: > > 0x00000000004004a0 <+0>: push %rbp > 0x00000000004004a1 <+1>: mov %rsp,%rbp > 0x00000000004004a4 <+4>: movl $0x0,-0x4(%rbp) > 0x00000000004004ab <+11>: movl $0x0,-0x8(%rbp) > 0x00000000004004b2 <+18>: cmpl $0xfffff,-0x8(%rbp) > 0x00000000004004b9 <+25>: jae 0x4004d2 <main+50> > X 0x00000000004004bf <+31>: jmpq 0x4004c4 <main+36> > X 0x00000000004004c4 <+36>: mov -0x8(%rbp),%eax > 0x00000000004004c7 <+39>: add $0x1,%eax > 0x00000000004004ca <+42>: mov %eax,-0x8(%rbp) > 0x00000000004004cd <+45>: jmpq 0x4004b2 <main+18> > 0x00000000004004d2 <+50>: jmpq 0x4004ab <main+11> > > X marks the spot where a dummy jump was inserted to aid the debugger. > The line table looks like this: > > Line Number Statements: > [0x00000070] Extended opcode 2: set Address to 0x4004a0 > [0x0000007b] Special opcode 6: advance Address by 0 to 0x4004a0 and > Line by 1 to 2 > [0x0000007c] Set column to 23 > [0x0000007e] Set prologue_end to true > [0x0000007f] Special opcode 162: advance Address by 11 to 0x4004ab > and Line by 3 to 5 > [0x00000080] Set column to 33 > [0x00000082] Set is_stmt to 0 > [0x00000083] Special opcode 103: advance Address by 7 to 0x4004b2 > and Line by 0 to 5 > [0x00000084] Set column to 5 > [0x00000086] Set is_stmt to 1 > [0x00000087] Special opcode 103: advance Address by 7 to 0x4004b9 > and Line by 0 to 5 > X [0x00000088] Special opcode 92: advance Address by 6 to 0x4004bf and > Line by 3 to 8 > X [0x00000089] Set column to 46 > [0x0000008b] Special opcode 72: advance Address by 5 to 0x4004c4 and > Line by -3 to 5 > [0x0000008c] Set column to 5 > [0x0000008e] Set is_stmt to 0 > [0x0000008f] Special opcode 131: advance Address by 9 to 0x4004cd > and Line by 0 to 5 > [0x00000090] Set column to 3 > [0x00000092] Set is_stmt to 1 > [0x00000093] Special opcode 73: advance Address by 5 to 0x4004d2 and > Line by -2 to 3 > [0x00000094] Advance PC by 5 to 0x4004d7 > [0x00000096] Extended opcode 1: End of Sequence > > Again, X marks the spot where we tell the debugger there is a line > transition (from line 5 to line 8), and so step/next execution should end. > > I'm inclined to say we should fix this in GCC in a similar way. GDB > relies on the line table information since it can't correctly tell when > we have transitioned to a new source line by looking just at the > instruction stream. > > My idea is to create a dummy jump (gimple sounds more appropriate) with > the source location of the last line of the loop body (in this case line > number 8). That would trigger the creation of a new line table entry, > making GDB happy. > > Is there a better way to force the compiler to output such a line table > transition without having to resort to a dummy jump? Is there a safer > way to add such transitions without worrying about the optimizer getting > rid of them later on? Should we even worry about preserving such > information for higher optimization levels? > > I'll also need a way to store the source location of the last line of > the loop body, since closing braces and friends are ignored by GCC for > code generation purposes. We just consume those tokens without second > thought. > > There are other interesting variations, like the following: > > int main(void) > { > int var = 0; > > for (;;) > { > 7 var++; > 8 } > > return 0; > } > > In the case above, the debugger gets stuck in line 7. With the proposed > solution it would transition to line 8 and then return to line 7. > > Another case is this one: > > int main (void) > { > while (1) > { > 5 for (unsigned int i = 0U; i < 0xFFFFFU; i++) > 6 ; > } > } > > Similarly, GDB gets stuck in line 5. With the proposed fix, it would > transition to line 6 before returning to line 5. > > Feedback would be greatly appreciated.
I think that adding an extra jump is unwanted. Instead - if you disregard the single-source-line case - there's always the jump and the label we jump to which might/should get different source locations. Like in one of the above cases: main () { int D.1803; [t.c:2:1] { int var; [t.c:3:5] var = 0; <D.1801>: [t.c:7:8] var = var + 1; [t.c:7:8] goto <D.1801>; [t.c:10:8] D.1803 = 0; [t.c:10:8] return D.1803; seen at GIMPLE. Of course we lose the label once we build the CFG, but we retain a goto-locus which we could then put back on the jump statement. For this case we at the moment get .L2: .loc 1 7 0 discriminator 1 addl $1, -4(%rbp) jmp .L2 and we could do .L2: .loc 1 7 0 discriminator 1 addl $1, -4(%rbp) .loc 1 5 0 jmp .L2 thus assign the "destination" location to the jump instruction? The first question is of course what happens with the edges goto_locus at the moment and why we get the code we get. The above solution might also be a bit odd since for the loop entry we'd first see line 7 and only after that line 5. But fixing that would mean we have to output an extra instruction (where I'd chose a nop instead of some random extra jump). Richard. > Thanks, > Luis