Jonathan Worthington wrote: > FORMAT PROPOSAL...
Great! Anything that brings parrot closer to being able to report the HLL filename and line numbers is a good thing! > SOURCE SEGMENTS > ... the idea would seem to be > that this segment can contain source code. I suspect the intention of it > was to store the source code of high level languages rather than PASM or > PIR. I don't think Parrot should care about what languages are in the source segments. If someone is writing directly in PASM or PIR, that can go in a source segment. If someone is writing in a high-level langauge, that can go in a source segment. If someone is writing data from which HLL code is generated by some utility (e.g. yacc, a UML tool, or a GUI designer), that data can go in a source segment too. Any kind of source code for which there exists some kind of debugging tool is a candidate to go into a source segment. This implies that there could be more than one source segment per .pbc file, and more than one source location for each opcode. It also implies that (eventually) parrot will have a way of knowing how to call all the candidate debuggers for a particular bytecode location (according to which source language the programmer wants to debug in). [Incidentally, source segments may also meet the needs of those who wish to distribute source with every application, without burdening those who just want to run the compiled code.] ... > 2) Allowing for a reference into the source segment in place of a filename. Some development tools are still going to want the filename, even if there is a corresponding source segment in the .pbc file. I think it should be possible to include both. > COMPATIBILITY > This change is incompatible with the current debug segment format. But > that's OK, we're still in development. Sure, but if we're going to change it, let's change it to something general that won't need to be changed again after version 1.0 is released. This is something that Dan Sugalski mooted in his "WCB: Full bytecode metadata" blog entry: http://www.sidhe.org/~dan/blog/archives/000419.html I like the idea that each HLL can store whatever kind of metadata it wants. In particular, I'd like to have my Amber compiler put column numbers as well as line numbers into the .pbc file, and perhaps even information about which optimizations it has applied. > 3) Still being space-efficient on disk Source segments should probably be compressed. There's a lot of repetition and whitespace in most source languages, so they tend to compress really well. Any reference into the source would be an offset into the uncompressed source (which would only need to be uncompressed during debugging runs). > The opcode stream will contain one line number per > bytecode instruction. You are proposing to use a chain of mappings to record the filename; why not use the same system for recording all kinds of metadata including line numbers? Sure, there's a small performance penalty - only during debugging runs - but there's a worthwhile space saving on disk (because typical HLLs produce a lot of bytecodes per line of source). Regards, Roger Browne