Re: DRAFT RFC: Enhanced Pack/Unpack

Glenn Linderman Wed, 02 Aug 2000 12:16:56 -0700
Edwin,

This writeup certainly is a great first draft for this RFC.  I'll have to track down
those references.

I was surprised by the read/write operations, but have no objection to them.
New/get/set and the individual data member access functions are the critical pieces,
as the I/O could be done to normal variables, but it would take more steps that way,
so read/write are nice enhancements.

A few comments below, in context of the RFC, the parts of which I'm not commenting on
are elided for brevity.

Edwin Wiles wrote:

> =head3  Data Definition
>
>         While we could use a C-ish 'struct' syntax, that would imply a
>         whole new parser capability.  Something built up out of
>         existing perl syntax would be easier to implement?
>
>         For example, assume a set of C structs as follows:
>
>         struct foo {
>                int bar;
>                int baz;
>                int count;
>                };
>
>         Followed by 'count' copies of:
>
>         struct stroff {
>                int length;
>                int offset;
>                };
>
>         Followed by 'count' variable length, not necessarily null
>         terminated collections of bytes.  Possibly strings, possibly
>         not, but we'll consider them strings for now.
>
>         [ 'bar', 'i', 'baz', 'i', 'count', 'i' ]

Three kinds of comment here.

The first kind: this list looks very paired, can we make that more explicit?

The second kind: Many of the "pack" code characters imply both type and length, and
that is useful and concise (vs. 2 parameters), and we could use those same character
codes here... but I'd rather see something more readable than single character
stuff... if other RFCs come up with sized type names for Perl variables, those same
names could be used here... that discussion seems to be continuing, so I don't know
how it will end.  Another alternative would be to borrow type names from C, but
theirs are not fixed size.  Another alternative would be to borrow type names from
some DDL or IDL type language.

The third kind: to be really, really flexible (want to read a FAT-12 data
structure?), bit sizes might be appropriate for integers and pad fields.  And a
(non-IEEE) floating point number requires quite a few parameters to define: exponent
size, exponent base, mantissa size, implied leading 1 or not, etc.  Doubt we want to
go that far in a basic implementation, but maybe some hooks would allow it.

So here's some ideas for consideration, that attempt to address all the above points:

1) make the list use => separators: the above example would become

   [ 'bar' => 'i', 'baz' => 'i', 'count' => 'i' ]

2) use bigger names, and support standard types simply.  Here are some examples,
starting with the same one, but using invented sized type names:

  [ 'bar' => 'integer32', 'baz' => 'integer32', 'count' => 'integer32' ]

  [ 'var1' => 'int32', 'var2' => 'int16', 'var3' => 'int8' ]

3) allow definitions to support odd-sized standard types

  [ 'bar' => [ 'integer', 32 ], 'baz' => ['integer', 32 ], 'count' => ['integer',
32]]

  [ 'var1' => 'int32', 'var2' => ['integer', 12 ], 'var3' => ['integer', 20]]

4) allow hooks to support non-standard types.  The idea here is for the user to write
two subs to convert a funny type to or from a scalar perl variable (could be a ref,
of course) from its linear binary representation, and then tell the Structure
feature/module about them via a method of some sort.  Then they can be converted just
like, and mixed with, regular types.

  sub Structure::define ( <type name>, <frombinarysub>, <tobinarysub> )

  sub from_funny ( <type_params>, <binary_var>, <bit_offset> )
  # returns ( <next_bit_offset>, <funny_var> )

  sub to_funny ( <type_params>, <binary_var>, <bit_offset>, <funny_var> )
  # returns ( <next_bit_offset> )

  Structure::define ( 'funny', \&from_funny, \&to_funny );

  [ 'var1' => 'int32', 'var2' => ['funny', 6, 18, 12]]

  <type_params> is the reference to the array defined in the definition, in this
example, it would be ['funny_type', 6, 18, 12]  (it appears that the type 'funny' has
three parameters to define its storage characteristics).

Here's 'funny2' with no parameters for its storage characteristics:

  sub from_funny2 ( <type_params>, <binary_var>, <bit_offset> )
  # returns ( <next_bit_offset>, <funny2_var> )

  sub to_funny2 ( <type_params>, <binary_var>, <bit_offset>, <funny_var> )
  # returns ( <next_bit_offset> )

  Structure::define ( 'funny2', \&from_funny2, \&to_funny2 );

  [ 'var1' => 'int32', 'var2' => 'funny2']


>        This would do for the first structure.  Arrays are used rather
>         than hashes to guarantee data order.
>
>         [ 'length', 'i', 'offset', 'i' ]

Or   ['length' => 'int32', 'offset' => 'int32' ]

>         Will do for the second structure.  Now how do we join these
>         two?
>
>         [ 'bar', 'i', 'baz', 'i', 'count', 'i',
>           repeat( 'count', [ 'length', 'i', 'offset', 'i' ] ) ]

Or  ['bar' => 'int32', 'baz' => 'int32', 'count' => 'int32',
           'struct_2' => ['array', 'count', [ 'length' => 'int32', 'offset' =>
'int32' ]]]

Here's the above structure, without a count, for when there are a known fixed number
of the second structures:

    ['bar' => 'int32', 'baz' => 'int32',
           'struct_2' => ['array', 4, [ 'length' => 'int32', 'offset' => 'int32' ]]]

>         Okay, that looks like it might work, now add in the strings
>         referenced by length and offset.  [Ideas anyone?]

OK, here's a (Forth or Pascal or BASIC) counted string:

     $cntstr = new Structure ([ 'count' => 'int8', 'string' => ['array', 'count',
'char' ]]);

Here's a (C or C++) null-terminated string:

     $nulstr = new Structure (['string' => ['array', 'null', 'char' ]]);

>From the above examples, it becomes clear that the 'array' type requires two
parameters, a count, and a type.  It also becomes clear that the type parameters can
be the name of a type, or an arrayref containing the parameters describing a complex
type.  Here's a fixed size array of 'funny2':

    $funny2_array_var = new Structure ([ 'funny2_array' => ['array', 33, 'funny2' ]])

   @funny2_array = $funny2_array_var -> get ( 'funny2_array' ); # whole array
   print $#funny2_array;    # prints '32'

   $funny2_var = $funny2_array_var -> get ( 'funny2_array', 26 ); # one element

While type 'array' could probably be defined as a user type, that would restrict some
nice features:

Array of char can be gotten two ways, for two usages:

   $var = $nulstr -> get ( 'string' );  # turns into Perl string value

   @char_array = $nulstr -> get ( 'string' );  # array of characters

The latter example leads directly to a way to obtain an array of int, into a Perl
array.

These examples are not meant to preclude the Class::Class type get/set functions, I'm
just not sure how they work in detail, as yet.

Major extension idea:

It would also, perhaps, be nice to use this same syntax to "linearize" any Perl
variable, along the lines of Data::Dumper, or Storable.  Bringing that sort of
functionality into the syntax, would add to Perl some extremely flexible binary data
manipulation capabilities.  If it becomes part of the core, it could also be
extremely fast for standard data types.

--
Glenn
=====
There  are two kinds of people, those
who finish  what they start,  and  so
on...                 -- Robert Byrne



_____NetZero Free Internet Access and Email______
   http://www.netzero.net/download/index.html
Re: DRAFT RFC: Enhanced Pack/Unpack

Reply via email to