Hi,

The current format of the debug segment in Parrot packfiles (.pbc files), as documented in doc/parrotbyte.pod, only allows for a single source file to be named. This became insufficient some time ago since we had .include directives; it also means that there's nothing sensible that pbc_merge can do with the debug segments it finds in input files.

WHAT WE HAVE NOW
Currently, we store two things:-
1) The filename of a single source file, as an additional field in the header 2) The line number in the source file for each bytecode instruction, as the segment's opcode stream

WHAT SOURCE?
The debug segment as we currently have it relates to PIR and PASM source files, not to high level language source files. Currently PIR parses a directive that looks like this:
   #line 'filename'
This is for compilers to supply the line numbers and file names of HLL source files. Currently, nothing is done with these directives after they are parsed, but the data they provide should go into a seperate HLL debug segment.

As the needs of the PASM/PIR debug segments and the HLL debug segments would seem to be the same, this proposal will detail a single format that should work for both of them. If it is determined that the HLL debug segment needs something more sophisticated, this proposal still stands for the PASM/PIR debug segment.

SOURCE SEGMENTS
This is currently mentioned in parrotbyte.pod; the idea would seem to be that this segment can contain source code. I suspect the intention of it was to store the source code of high level languages rather than PASM or PIR. I think the doc is correct in stating that this segment is currently unused. However, in the future it likely will be, so it makes sense to consider its future existence now while re-designing the debug segment(s).

FORMAT PROPOSAL
The aims of the new format, intended for both the PASM/PIR debug segment and the HLL debug segment are:
1) Supporting multiple input files
2) Allowing for a reference into the source segment in place of a filename.
3) Still being space-efficient on disk

The opcode stream will contain one line number per bytecode instruction. No information as to what file that line is in will be stored in this stream. (This is pretty much the same as what we have now).

The header (after the standard stuff that every header has) will start with a count of the number of source file to bytecode position mappings that are in the header.

 0 (relative)
 +----------+----------+----------+----------+
 | number of source => bytecode mappings     |
 +----------+----------+----------+----------+

A source to bytecode position mapping simply states that the bytecode that starts from the specified offset up until the offset in the next mapping, or if there is none up until the end of the bytecode, has it's source in location X.

A mapping always starts with the offset in the bytecode, followed by the type of the mapping.

 0 (relative)
 +----------+----------+----------+----------+
 |              bytecode offset              |
 +----------+----------+----------+----------+

 4
 +----------+----------+----------+----------+
 |               mapping type                |
 +----------+----------+----------+----------+

There are 3 mapping types.

Type 0 means there is no source available for the bytecode starting at the given offset. No further data is stored with this type of mapping; the next mapping continues immediately after it.

Type 1 means the source is available in a file. A NULL terminated string containing the filename follows.

Type 2 means the source is available in a source segment. Another integer follows, which will specify which source file in the source segment to use.

Note that the ordering of the offsets into the bytecode must be sequential; a mapping for offset 100 cannot follow a mapping for offset 200, for example.

COMPATIBILITY
This change is incompatible with the current debug segment format. But that's OK, we're still in development.

Comments on this would be very welcome, even if it's as simple as "looks OK to me" or "looks terrible to me". :-)

Thanks,

Jonathan

Reply via email to