On Sun, Sep 25, 2005 at 09:43:15PM -0700, Chip Salzenberg wrote: > On Sun, Sep 25, 2005 at 10:04:16AM -1000, Joshua Hoblitt wrote: > > * The magic number is no longer an opcode outside the header. It is > > now an 8 byte magic string at the the beginning of the header. > > I should think four would do, but no matter.
It's so 'large' because of an idea 'borrowed' from the PNG spec. One or more of the bytes 0 & 4-7 are likely to be damaged by common transport encoding errors. I've changed my proposal to explicitly note this. > > * Bytes 20 through 31 are now padding so the core.op fingerprint can > > be expanded in the future. > > Marvy. Important note: All those bytes *must* be zeros in the current > implementation. See below. That was already in my proposal but I've changed the wording to include I<MUST>. > > * Do we need to keep the Opcode Type? It's not clear to me what it's used > > for. > > > > +----------+----------+----------+----------+ > > | Opcode Type (Perl = 0x5045524c) | > > +----------+----------+----------+----------+ > > I don't think it's useful. A pbc file is Parrot byte code; if Parrot > learns to translate .NET, Python, or JVM files, it'll read them in > their native formats. Sounds reasonable. It's been dumped. > > > * Does it make sense to use a fix size header? The offset of the first > > segment could be calculated by multiplying an "offset byte" and the > > wordsize. > > We don't have to decide that. A fixed size header now does not > foreclose the possiblity that byte #31 will be that "how many more > words should be considered part of the header" feature you suggest. Fair enough. An updated patch is attached. -J --
Index: docs/parrotbyte.pod =================================================================== --- docs/parrotbyte.pod (revision 9235) +++ docs/parrotbyte.pod (working copy) @@ -7,8 +7,33 @@ =head1 Format of the Parrot bytecode +Parrot's bytecode format consists of a small endian neutral header region +followed by a series of segments. ALL words (non-bytes) following the header +are are stored in native order, unless otherwise specified. + +=head1 PBC Header + +The PBC header is a fixed 32 bytes in length. Header values are all encoded as +either a single byte or a string so that it can be parsed without having to +consider the endianness of the data. + 0 1 2 3 +----------+----------+----------+----------+ + | 0xfe 0x50 0x42 0x43 | + +----------+----------+----------+----------+ + | 0x0d 0x0a 0x1a 0x0a | + +----------+----------+----------+----------+ + +The header begins with an eight byte I<File Signature> or I<Magic String>. +This is equivalent to the C strings C<\376PBC\r\n\032\n> (ASCII) and +C<\xfe\x50\x42\x43\x0d\x0a\x1a\x0a> sans the terminating C<NULL> bytes. Bytes +0 and 4-7 are designed to catch common types of file corruption caused by +transport encoding mechanisms (for example, FTP ASCII transfers). This format +was inspired by the PNG Specification. Please see RFC 2083 for an explanation +of the advantages of this strategy. + + 8 9 10 11 + +----------+----------+----------+----------+ | Wordsize | Byteorder| Major | Minor | +----------+----------+----------+----------+ @@ -20,7 +45,7 @@ Byteorder currently supports two values: (0-Little Endian, 1-Big Endian) - 4 5 + 12 13 14 +----------+----------+----------+----------+ | INT size | FloatType| 10 Byte ... | +----------+----------+----------+----------+ @@ -29,26 +54,40 @@ | core.ops is here | +----------+----------+----------+----------+ -INT size (sizeof(INTVAL)) must be 4 or 8. FloatType 0 is IEEE 754 8 byte +INT size (C<sizeof(INTVAL)>) must be 4 or 8. FloatType 0 is IEEE 754 8 byte double, FloatType 1 is i386 little endian 12 byte long double. - 16 + + 20 21 22 23 +----------+----------+----------+----------+ - | Parrot Magic = 0x 13155a1 | + | padding | +----------+----------+----------+----------+ - -Magic is stored in native byteorder. The loader uses the byteorder header to -convert the Magic to verify. More specifically, ALL words (non-bytes) in the -bytecode file are stored in native order, unless otherwise specified. - - 20* + | | +----------+----------+----------+----------+ - | Opcode Type (Perl = 0x5045524c) | + | | +----------+----------+----------+----------+ -The asterisk for the offset states, from here we have opcodes. The given -offsets are for 32 bit opcode types only. +Following the core.ops fingerprint, the header I<MUST> be padded with C<NULL> +bytes to be an overall 32 bytes in length. +All words following the header will be interpreted as Op codes. + +=head2 Magic Description + +The following is C<file(1)> description of the PBC Header format. + + 0 string \xfe\x50\x42\x43\x0d\x0a\x1a\x0a Parrot Bytecode (PBC) + >10 byte x + >11 byte x version %2$d.%1$d, + >8 byte x wordsize is %d bytes, + >9 byte =0 byteorder is little endian, + >9 byte =1 byteorder is big endian, + >9 byte >1 byteorder is unknown, + >12 byte x integers are %d bytes, + >13 byte =0 floats are IEEE 754 + >13 byte =1 floats are i387 96-bit + >13 byte >1 float type is unknown + =head1 PBC FORMAT 1 All segments are aligned at a 16 byte boundary. All segments share a common @@ -293,6 +332,12 @@ Eventually there will be a more complete and useful PackFile specification, but this simple format works well enough for now (c. Parrot 0.0.5). +=head1 REFERENCES + +=head2 RFC 2803 + +L<ftp://ftp.rfc-editor.org/in-notes/rfc2083.txt> + =head1 SEE ALSO F<packfile.c>, F<packfile.h>, F<packout.c>, F<packdump.c>, F<pf/*.c>, and the @@ -306,7 +351,9 @@ Variable argument opcodes update by Jonathan Worthington C<[EMAIL PROTECTED]> +The header format was mangled by Joshua Hoblitt (JHOBLITT) C<[EMAIL PROTECTED]> + =head1 VERSION -2005.09.19 +2005.09.25
pgpwugjhhbyFP.pgp
Description: PGP signature