This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Distinguish packed binary data from printable strings

=head1 VERSION

  Maintainer: Tim Conrow <[EMAIL PROTECTED]>
  Date: 18 Sept 2000
  Mailing List: [EMAIL PROTECTED]
  Number: 258
  Version: 1
  Status: Developing

=head1 ABSTRACT

Perl should be able to distinguish between printable strings and
packed binary data stored as strings (presumed to not be printable
text) just as it can distinguish between numeric and non-numeric
strings. This would permit greater specificity in programming and thus
better error checking.

=head1 DESCRIPTION

Differentiating between packed strings and printable strings would
permit some useful error checking. A new scalar flag, which I'll call
BOK (for "Binary OK"), would be set in a scalar's flags by any builtin
function or operator that produces a string which is likely to be a
packed data structure of some sort rather than printable text or a
numeric value. These include C<pack>, C<read>, C<sysread>, C<vec>, the
multi-argument form of C<select>, and the string-context form of
operators C<<< >>, <<, |, &, ^, ~ >>>. (Others???)

With packed strings recognizable to the compiler and interpreter, they
would interact with functions, operators, and other relevant scalar
types as follows:

  use warnings 'packed';
  use strict   'packed';

  $a = pack("A*","abc");   # $a(BOK) = 0; printable string
  $a = pack("a*","123");   # $a(BOK) = 1; packed thing
  print $a;                # Promote via pack("a*",$a), issue warning
  print "$a";              # Stringify $a; same as perl5
  $b = pack($tmpl8,@more_data);     # $b(BOK) = 1
  $c = $a ^ $b;            # String context xor. $c(BOK) = 1
  $d = "255";
  $e = "13";
  $c = $d ^ $e;            # Numeric context xor via promotion to numeric.
  $d = $a ^ 255;           # Error
  $d = $a ^ "zzz";         # Promote via pack("a*","zzz"), but issue warning
  $e = vec($a,12,4);       # OK; $e is numeric; $e(BOK) = 0
  $e = vec(123,1,8);       # Error
  vec(123,1,8) = 1;        # Error
  $binary = "\x04\x12";
  $e = vec($binary,1,8);   # Promote $binary, issue warning
  vec($binary,1,8) = 1;    # Promote $binary, no warning
  select(undef,$rin=0,undef,0.25);  # Error
  select(undef,$rin="",undef,0.25); # Promote $rin, no warning
  $a = pack("a*","123");
  $b = pack("a*","456");
  $c = $a + $b;            # Error
  $c = "$a" + "$b";        # Weird, but OK
  $c = <STDIN>;            # $c(BOK) = 0; $c is a normal string
  sysread FOO,$a,16;       # $a(BOK) = 1; $a is a packed thing
  syswrite BAR,123;        # Error
  syswrite BAR,"123";      # Promote, issue warning
  syswrite BAR,pack "a*","123";     # OK
  if($a) ...               # Always true
  if($a eq "xxx") ...      # Stringify, issue warning
  if("$a" eq "xxx") ...    # Stringify, no warning
  if($a == 123) ...        # Error

The exceptions for vec and select (string arg.s auto-promoted with
C<pack("a*",$str)> without warnings) are for backward compatibility.

I'm sure I haven't covered all the relevant cases, but I hope the
intent is clear: to cut down on the room for accidental use of
un/packed data in inappropriate circumstances and to increase the
ability of the user to be specific regarding intent. In particular,
accidentally using string context bit ops when meaning to use numeric,
or vice versa, would raise a warning, and mixing numeric and packed
arg.s would be an error. 

If anyone knows of common constructs/idioms which would break under
this scheme and where it's too painful to add C<pack("a*",...)> or
C<"..."> as appropriate ... well I don't have to ask to have them
pointed out, do I? :-) The only cases I've been able to think of are
JAPHs or code samples.

If RFCs 73 and/or 161 end up being adopted, the idea of packed things
being distinct could be extended to allow additional
functionality. E.g.

  $a = pack("a*","\x8f\x01"); # Save the template in the instance data
  if(ref($a) eq "Packed") { ... }
  $b = $a->unpack;            # Use saved template
  $a->STRING = sub { join ",",$_[0]->unpack; };
  print "$a\n"; # Readable

If RFC 89 is adopted, a variable could be forced to hold only packed
things. E.g.

  my packed $thingie : (template=>"lll");
  $thingie->pack 123,456,789;

... or something like that.

How this would interact with RFCs 142,246-250 is TBD, but I see no
outright conflicts right off. This might dovetail well with RFC 159 to
allow un/packing on the fly based on context, but I'm not sure.

=head1 IMPLEMENTATION

I know almost nothing about internals, so this is probably wrong, but
see if I convey my meaning anyway.

=item *

With exceptions exemplified above, builtin operators and functions
which operate in a bitwise manner on their string arguments would
behave as follows:

  NOK  POK  BOK
  -------------
   0    0    1   not possible
   0    1    0   promote, warning if use warnings 'packed'
   0    1    1   OK
   1    0    0   error if use strict 'packed', otherwise
                 stringify and promote, warning if 
                 use warnings 'packed'
   1    0    1   not possible
   1    1    0   promote, warning if use warnings 'packed'
   1    1    1   OK

=item *

By way of an imperfect analogy, note the similarity between packed
strings having a BOK flag via C<pack> (and others) and regexs having
an ROK flag via C<qr()>. Implementation of BOK is, of course,
considerably simpler.

=item *

When translating code with p526, simply put

  no warnings 'packed';
  no strict   'packed';

at the top.


=head1 REFERENCES

RFC  73: All Perl core functions should return objects

RFC  89: Controllable Data Typing

RFC 142: Enhanced Pack/Unpack

RFC 159: True Polymorphic Objects

RFC 161: Everything in Perl becomes an Object

RFC 246-250: Various pack/unpack enhancements


Reply via email to