On Sun, Sep 25, 2005 at 09:43:15PM -0700, Chip Salzenberg wrote:
> On Sun, Sep 25, 2005 at 10:04:16AM -1000, Joshua Hoblitt wrote:
> > *   The magic number is no longer an opcode outside the header.  It is
> >     now an 8 byte magic string at the the beginning of the header.
> 
> I should think four would do, but no matter.

It's so 'large' because of an idea 'borrowed' from the PNG spec.  One or
more of the bytes 0 & 4-7 are likely to be damaged by common transport
encoding errors.  I've changed my proposal to explicitly note this.

> > *   Bytes 20 through 31 are now padding so the core.op fingerprint can
> >     be expanded in the future.
> 
> Marvy.  Important note: All those bytes *must* be zeros in the current
> implementation.  See below.

That was already in my proposal but I've changed the wording to include
I<MUST>.

> > * Do we need to keep the Opcode Type?  It's not clear to me what it's used
> >   for.
> > 
> >    +----------+----------+----------+----------+
> >    |         Opcode Type (Perl = 0x5045524c)   |
> >    +----------+----------+----------+----------+
> 
> I don't think it's useful.  A pbc file is Parrot byte code; if Parrot
> learns to translate .NET, Python, or JVM files, it'll read them in
> their native formats.

Sounds reasonable.  It's been dumped.

> 
> > * Does it make sense to use a fix size header?  The offset of the first
> > segment could be calculated by multiplying an "offset byte" and the
> > wordsize.
> 
> We don't have to decide that.  A fixed size header now does not
> foreclose the possiblity that byte #31 will be that "how many more
> words should be considered part of the header" feature you suggest.

Fair enough.

An updated patch is attached.

-J

--
Index: docs/parrotbyte.pod
===================================================================
--- docs/parrotbyte.pod (revision 9235)
+++ docs/parrotbyte.pod (working copy)
@@ -7,8 +7,33 @@
 
 =head1 Format of the Parrot bytecode
 
+Parrot's bytecode format consists of a small endian neutral header region
+followed by a series of segments.  ALL words (non-bytes) following the header
+are are stored in native order, unless otherwise specified.
+
+=head1 PBC Header
+
+The PBC header is a fixed 32 bytes in length.  Header values are all encoded as
+either a single byte or a string so that it can be parsed without having to
+consider the endianness of the data.
+
   0          1          2          3
   +----------+----------+----------+----------+
+  | 0xfe       0x50       0x42       0x43     |
+  +----------+----------+----------+----------+
+  | 0x0d       0x0a       0x1a       0x0a     |
+  +----------+----------+----------+----------+
+
+The header begins with an eight byte I<File Signature> or I<Magic String>.
+This is equivalent to the C strings C<\376PBC\r\n\032\n> (ASCII) and
+C<\xfe\x50\x42\x43\x0d\x0a\x1a\x0a> sans the terminating C<NULL> bytes.  Bytes
+0 and 4-7 are designed to catch common types of file corruption caused by
+transport encoding mechanisms (for example, FTP ASCII transfers).  This format
+was inspired by the PNG Specification.  Please see RFC 2083 for an explanation
+of the advantages of this strategy.
+
+  8          9          10         11
+  +----------+----------+----------+----------+
   | Wordsize | Byteorder|  Major   |  Minor   |
   +----------+----------+----------+----------+
 
@@ -20,7 +45,7 @@
 
 Byteorder currently supports two values: (0-Little Endian, 1-Big Endian)
 
-  4          5
+  12         13         14
   +----------+----------+----------+----------+
   | INT size | FloatType|  10 Byte  ...       |
   +----------+----------+----------+----------+
@@ -29,26 +54,40 @@
   |           core.ops is here                |
   +----------+----------+----------+----------+
 
-INT size (sizeof(INTVAL)) must be 4 or 8.  FloatType 0 is IEEE 754 8 byte
+INT size (C<sizeof(INTVAL)>) must be 4 or 8.  FloatType 0 is IEEE 754 8 byte
 double, FloatType 1 is i386 little endian 12 byte long double.
 
-  16
+
+  20         21         22         23
   +----------+----------+----------+----------+
-  |         Parrot Magic = 0x 13155a1         |
+  |                  padding                  |
   +----------+----------+----------+----------+
-
-Magic is stored in native byteorder. The loader uses the byteorder header to
-convert the Magic to verify. More specifically, ALL words (non-bytes) in the
-bytecode file are stored in native order, unless otherwise specified.
-
-  20*
+  |                                           |
   +----------+----------+----------+----------+
-  |         Opcode Type (Perl = 0x5045524c)   |
+  |                                           |
   +----------+----------+----------+----------+
 
-The asterisk for the offset states, from here we have opcodes. The given
-offsets are for 32 bit opcode types only.
+Following the core.ops fingerprint, the header I<MUST> be padded with C<NULL>
+bytes to be an overall 32 bytes in length.
 
+All words following the header will be interpreted as Op codes.
+
+=head2 Magic Description
+
+The following is C<file(1)> description of the PBC Header format.
+
+    0       string  \xfe\x50\x42\x43\x0d\x0a\x1a\x0a Parrot Bytecode (PBC)
+    >10     byte    x
+    >11     byte    x   version %2$d.%1$d,
+    >8      byte    x   wordsize is %d bytes,
+    >9      byte    =0  byteorder is little endian,
+    >9      byte    =1  byteorder is big endian,
+    >9      byte    >1  byteorder is unknown,
+    >12     byte    x   integers are %d bytes,
+    >13     byte    =0  floats are IEEE 754
+    >13     byte    =1  floats are i387 96-bit
+    >13     byte    >1  float type is unknown
+
 =head1 PBC FORMAT 1
 
 All segments are aligned at a 16 byte boundary. All segments share a common
@@ -293,6 +332,12 @@
 Eventually there will be a more complete and useful PackFile specification, but
 this simple format works well enough for now (c. Parrot 0.0.5).
 
+=head1 REFERENCES
+
+=head2 RFC 2803
+
+L<ftp://ftp.rfc-editor.org/in-notes/rfc2083.txt>
+
 =head1 SEE ALSO
 
 F<packfile.c>, F<packfile.h>, F<packout.c>, F<packdump.c>, F<pf/*.c>, and the
@@ -306,7 +351,9 @@
 
 Variable argument opcodes update by Jonathan Worthington C<[EMAIL PROTECTED]>
 
+The header format was mangled by Joshua Hoblitt (JHOBLITT) C<[EMAIL PROTECTED]>
+
 =head1 VERSION
 
-2005.09.19
+2005.09.25
 

Attachment: pgpwugjhhbyFP.pgp
Description: PGP signature

Reply via email to