RFC: static line number information

Juergen Boemmels Mon, 07 Oct 2002 11:11:49 -0700

>From TODO:
    Metadata (source line number info, symbol table)

Currently parrot the line number information in parrot is done via
special opcodes, namely setline/getline and setfile/getfile. This is a
good solution when you write an interpreter in parrot, and the line
number information is only known at runtime. But this approach is very
inefficient if you have a tight loop like this:


$i = 0;
while ($i < 1000) {
  $i++;
}

With linenumber information enabled this would translate to something
like this

      setline 1
      set I0,0
LOOP: setline 2
      lt I0, 1000, DONE
      setline 3
      add I0,1
      branch LOOP
DONE: setline 5

This is inefficient, because there are two setlines in the loops.

A possible solution to this problem is doing it the same way the
c-compiler does: Add an extra structure to the executable which can
translate the current program counter to the source line. The advantage
of this approach is that the linenumber is only decoded when its
needed, and only the application which uses the line number
information has a runtime cost; the disadvantage is that the line
number information must be known at compile time (which I think is the
common case).

This can be implemented in 2 ways:
- Create our own debugging format
- Use an already existing one
The first way might be more fun, but I think the second one would be
better. IMHO we should use DWARF-2. The Mono Project does something
similar.

To get this working 3 things must happen.

1.) Extending of the packfile format to contain a section with
debugging information.

Changing the packfile is not an easy task, because many parts of
parrot depend on it. The ones I remember are packfile.c assemble.pl
and somewhere in imcc.

In principle the packfile is extendible in a backward compatible
way. At the moment there are (according to parrotbyte.pod) 3 segments
(FIXUP, CONSTANT, BYTECODE) in exactly that order. This can be easily
extended by just adding a 4th one DEBUG_LINE (or .debug_line or
..stabs). But doing some more extensions (e.g. call frames, language
dependent sections) by allocating numbers in a linear chain will be
painful. 

Another extension scheme would be make the 4th section a
directory section, in which all packfile extend-sections can be looked
up by name. This is still a backward-compatible solution.

But why use the 4th section as directory section. Naturally it would
be the first one. Since FIXUP is not used at the moment, this is not
such a drastic change as it first sounds.

2.) The assembler must emit the debugging information.

Emitting the debugging information from pure assembly code is not
really complicated, because the address and linenumber are always
increasing, the address increment is defined only by the current line
and the basic blocks can be easily analyzed.

But there must also be a way the higher level languages can assign
line numbers. Maybe C-like 
#line 1 "foo.c"
directives are a solution.
or create dedicated assembler macros
..line
..file
(maybe) .column

3.) The debugger must read this information.

I have some ugly little code lying around reading the line number
information out of an ELF binary. I can fix this up and integrate it,
but not doing the last step first.

Bonus point.) Teach the JIT-engine to translate the line number
information, so that you can debug a JITed program with gdb.

Comments?
b.
-- 
Juergen Boemmels                        [EMAIL PROTECTED]
Fachbereich Physik                      Tel: ++49-(0)631-205-2817
Universitaet Kaiserslautern             Fax: ++49-(0)631-205-3906
PGP Key fingerprint = 9F 56 54 3D 45 C1 32 6F  23 F6 C7 2F 85 93 DD 47

RFC: static line number information

Reply via email to