Re: [Q2] (Re: The strings design document)

Larry Wall Fri, 30 Apr 2004 09:02:28 -0700

On Fri, Apr 30, 2004 at 08:38:18AM -0700, Jeff Clites wrote:
: On Apr 28, 2004, at 5:01 AM, Dan Sugalski wrote:
: 
: >At 3:17 AM -0700 4/28/04, Jeff Clites wrote:
: >>On Apr 23, 2004, at 2:43 PM, Dan Sugalski wrote:
: >>
: >>>For example, consider the following:
: >>>
: >>>  use Unicode;
: >>>  open FOO, "foo.txt", :charset(latin-3);
: >>>  open BAR, "bar.txt", :charset(big5);
: >>>  $filehandle = 0;
: >>>  while (<>) {
: >>>    if ($filehandle++) {
: >>>      print FOO $_;
: >>>    } else {
: >>>      print BAR $_;
: >>>    }
: >>>    $filehadle %= 2;
: >>>  }
: >>
: >>What's the input record separator here?
: >
: >The filehandle default, which depends on the encoding and character 
: >set of the input data, or so Larry's told me.
: 
: So the nature of my question here is that I assume the input record 
: separator will be set as a string, with something similar to: $/ = "\n" 
: or $/ = "----" or whatever.


Well, it's very good of you to state your assumption out front,
because it happens to be inaccurate.  There is no $/ anymore.
Input record separator is an attribute of the filehandle in Perl 6,
for some definition of attribute, and some definition of filehandle,
which may or may not involve real attributes and/or layers.

And before you ask, chomping is also filehandle dependent.  In fact,
it's depending on each line, since if the input record separator is
a pattern, it can match different ways.  So chomping will generally
be done right within the <>, if you've asked for autochomping.
Alternately, the filehandle can mark the string somehow to indicate
where it should be chomped if you decide to chomp it later.

And just as a BTW, if you've asked for autochomping, you'd better use

    for <> {...}

rather than

    while <> {...}

since Perl 6 probably won't do the Perl 5 hack that makes the latter mean

    while defined($_ = <>) {...}

And before you point out that <> in a list context will use up all your
memory, I'll point out that it doesn't in Perl 6.  :-)

Offhand, I can't think of any more words to put in your mouth...

: If that's the case, presumably the user won't have to keep resetting it 
: as they open files stored in a different encodings, if (from their 
: point of view) they're using the same separator--they'll just set it 
: once. But having it defined as a string would seem to imply that you'll 
: have to transcode as you read to a common representation, in order to 
: find the line endings. That is, if $/ was assigned "latin-1" when it 
: was created, then you'll be forced to transcode to UTF-8 (or something) 
: as you read, right?

$/ is gone.  But if there were a $/, it would do the Right Thing.  :-)

(Which, in Perl 6, is to have consistent Unicode semantics regardless
of the supposed encoding of the string.)

Arguably, this discussion should be happening in p6l rather than p6i...

Larry

Re: [Q2] (Re: The strings design document)

Reply via email to