On Sun, Sep 25, 2005 at 12:24:52PM -0700, Chip Salzenberg via RT wrote:
> I think the right answer is to use a magic string rather than a
> magic number.

Leo and I been discussing this on #parrot and we've come to the same
conclusion.  Attached is a possible patch for parrotbyte.pod that
implements a number of changes to the header region.  It:

*   Expands the header to be 32 bytes in size.
*   The magic number is no longer an opcode outside the header.  It is
    now an 8 byte magic string at the the beginning of the header.
*   Bytes 20 through 31 are now padding so the core.op fingerprint can
    be expanded in the future.

Remaining issues are:

* Do we need to keep the Opcode Type?  It's not clear to me what it's used
for.

   +----------+----------+----------+----------+
   |         Opcode Type (Perl = 0x5045524c)   |
   +----------+----------+----------+----------+

* Does it make sense to use a fix size header?  The offset of the first
segment could be calculated by multiplying an "offset byte" and the
wordsize.  That would allow more then enough room for growth (at least
1KB) and ensure that the first segment is always 32-bit aligned.  Leo
and I disagree on this but I think it makes sense.  Additional metadata
could be added to the header without breaking backwards compatibility.

-J

--
Index: docs/parrotbyte.pod
===================================================================
--- docs/parrotbyte.pod (revision 9235)
+++ docs/parrotbyte.pod (working copy)
@@ -7,8 +7,30 @@
 
 =head1 Format of the Parrot bytecode
 
+ALL words (non-bytes) in the bytecode file are stored in native order, unless
+otherwise specified.
+
+=head1 PBC Header
+
+A PBC file starts with a header that is a fixed 32 bytes in length.  Header
+values are all encoded as either a single byte or a string so that it can be
+parsed without having to consider the endianness of the data.
+
   0          1          2          3
   +----------+----------+----------+----------+
+  | 0xfe       0x50       0x42       0x43     |
+  +----------+----------+----------+----------+
+  | 0x0d       0x0a       0x1a       0x0a     |
+  +----------+----------+----------+----------+
+
+The header begins with an eight byte  I<Magic String>.  This is equivalent to
+the C strings C<\376PBC\r\n\032\n> (ASCII) and
+C<\xfe\x50\x42\x43\x0d\x0a\x1a\x0a> sans the terminating C<NULL> bytes.  This
+format was inspired by the PNG Specification.  Please see RFC 2083 for an
+explanation of the advantages of this strategy.
+
+  8          9          10         11
+  +----------+----------+----------+----------+
   | Wordsize | Byteorder|  Major   |  Minor   |
   +----------+----------+----------+----------+
 
@@ -20,7 +42,7 @@
 
 Byteorder currently supports two values: (0-Little Endian, 1-Big Endian)
 
-  4          5
+  12         13         14
   +----------+----------+----------+----------+
   | INT size | FloatType|  10 Byte  ...       |
   +----------+----------+----------+----------+
@@ -29,19 +51,43 @@
   |           core.ops is here                |
   +----------+----------+----------+----------+
 
-INT size (sizeof(INTVAL)) must be 4 or 8.  FloatType 0 is IEEE 754 8 byte
+INT size (C<sizeof(INTVAL)>) must be 4 or 8.  FloatType 0 is IEEE 754 8 byte
 double, FloatType 1 is i386 little endian 12 byte long double.
 
-  16
+
+  20         21         22         23
   +----------+----------+----------+----------+
-  |         Parrot Magic = 0x 13155a1         |
+  |                  padding                  |
   +----------+----------+----------+----------+
+  |                                           |
+  +----------+----------+----------+----------+
+  |                                           |
+  +----------+----------+----------+----------+
 
-Magic is stored in native byteorder. The loader uses the byteorder header to
-convert the Magic to verify. More specifically, ALL words (non-bytes) in the
-bytecode file are stored in native order, unless otherwise specified.
+Following the core.ops fingerprint, the header is padded with C<NULL> bytes to
+be an overall 32 bytes in length.
 
-  20*
+All words following the header will be interpreted as Op codes.
+
+=head2 Magic Description
+
+The following is C<file(1)> description of the PBC Header format.
+
+    0       string  \xfe\x50\x42\x43\x0d\x0a\x1a\x0a Parrot Bytecode (PBC)
+    >10     byte    x
+    >11     byte    x   version %2$d.%1$d,
+    >8      byte    x   wordsize is %d bytes,
+    >9      byte    =0  byteorder is little endian,
+    >9      byte    =1  byteorder is big endian,
+    >9      byte    >1  byteorder is unknown,
+    >12     byte    x   integers are %d bytes,
+    >13     byte    =0  floats are IEEE 754
+    >13     byte    =1  floats are i387 96-bit
+    >13     byte    >1  float type is unknown
+
+FIXME: do we still need this Opcode?
+
+  32*
   +----------+----------+----------+----------+
   |         Opcode Type (Perl = 0x5045524c)   |
   +----------+----------+----------+----------+
@@ -293,6 +339,12 @@
 Eventually there will be a more complete and useful PackFile specification, but
 this simple format works well enough for now (c. Parrot 0.0.5).
 
+=head1 REFERENCES
+
+=head2 RFC 2803
+
+L<ftp://ftp.rfc-editor.org/in-notes/rfc2083.txt>
+
 =head1 SEE ALSO
 
 F<packfile.c>, F<packfile.h>, F<packout.c>, F<packdump.c>, F<pf/*.c>, and the
@@ -306,7 +358,9 @@
 
 Variable argument opcodes update by Jonathan Worthington C<[EMAIL PROTECTED]>
 
+The header format was mangled by Joshua Hoblitt (JHOBLITT) C<[EMAIL PROTECTED]>
+
 =head1 VERSION
 
-2005.09.19
+2005.09.25
 

Attachment: pgpy9eBFsVvla.pgp
Description: PGP signature

Reply via email to