Jonathan Worthington wrote:

> FORMAT PROPOSAL...

Great! Anything that brings parrot closer to being able to report the
HLL filename and line numbers is a good thing!

> SOURCE SEGMENTS
> ... the idea would seem to be 
> that this segment can contain source code.  I suspect the intention of it 
> was to store the source code of high level languages rather than PASM or 
> PIR.

I don't think Parrot should care about what languages are in the source
segments. If someone is writing directly in PASM or PIR, that can go in
a source segment. If someone is writing in a high-level langauge, that
can go in a source segment. If someone is writing data from which HLL
code is generated by some utility (e.g. yacc, a UML tool, or a GUI
designer), that data can go in a source segment too.

Any kind of source code for which there exists some kind of debugging
tool is a candidate to go into a source segment. This implies that there
could be more than one source segment per .pbc file, and more than one
source location for each opcode. It also implies that (eventually)
parrot will have a way of knowing how to call all the candidate
debuggers for a particular bytecode location (according to which source
language the programmer wants to debug in).

[Incidentally, source segments may also meet the needs of those who wish
to distribute source with every application, without burdening those who
just want to run the compiled code.]

...
> 2) Allowing for a reference into the source segment in place of a filename.

Some development tools are still going to want the filename, even if
there is a corresponding source segment in the .pbc file. I think it
should be possible to include both.

> COMPATIBILITY
> This change is incompatible with the current debug segment format.  But 
> that's OK, we're still in development.

Sure, but if we're going to change it, let's change it to something
general that won't need to be changed again after version 1.0 is
released.

This is something that Dan Sugalski mooted in his "WCB: Full bytecode
metadata" blog entry:
http://www.sidhe.org/~dan/blog/archives/000419.html

I like the idea that each HLL can store whatever kind of metadata it
wants. In particular, I'd like to have my Amber compiler put column
numbers as well as line numbers into the .pbc file, and perhaps even
information about which optimizations it has applied.

> 3) Still being space-efficient on disk

Source segments should probably be compressed. There's a lot of
repetition and whitespace in most source languages, so they tend to
compress really well. Any reference into the source would be an offset
into the uncompressed source (which would only need to be uncompressed
during debugging runs).

> The opcode stream will contain one line number per
> bytecode instruction.

You are proposing to use a chain of mappings to record the filename; why
not use the same system for recording all kinds of metadata including
line numbers? Sure, there's a small performance penalty - only during
debugging runs - but there's a worthwhile space saving on disk (because
typical HLLs produce a lot of bytecodes per line of source).

Regards,
Roger Browne


Reply via email to