Re: String Literals, take 1

Joseph Ryan Sat, 30 Nov 2002 00:09:35 -0800

From: James Mastros <[EMAIL PROTECTED]>

On 11/29/2002 7:40 PM, Joseph Ryan wrote:

(*Note: Please reply to [EMAIL PROTECTED], as this is only a temporary email address)

- References and Object stringification hasn't been defined.
I belive it goes somthing like this:
All objects define a .AS_STRING method. This method is called to stringify the object. The builtin types have builtin .AS_STRINGs, the primitive types autopromote. All strinification thus follows the same logical model, even if the implementation doesn't.

So, should all scalars stringify as a Str?  Or should scalar be the
default type, which all other types inherit from, redefining their
AS_STRING as necessary?

The default .AS_STRING for Strings is obvious. Int and Num stringify to a decimal number (using the e exponential form if it is shorter?).

I hope not; if someone wants a number in e form, they should specify it
themselves.

- If References interpolate in some sort of readable way, how do
 multi-leveled references interpolate, and how do self-referring
 data structures interpolate?
Multi-leveled: The outer .AS_STRING calls it's members' .AS_STRINGs. Circular: I have no idea.

Perhaps only first level references should stringify nicely, and inner
references stringify perl5 style.  I think that if Data::Dumper style
stringification is wanted, then a C<< use Data::Dumper; >> shouldn't
anger too many people.  This would solve circular referencing, at
least.

Possibly misleading: Leads people to think that a string is an array of chars, like in C? (I don't think so, but new-to-perl people might. I'm being nitpicky.)

Good point, I didn't even think about that.

This should be moved to general documentation for pick-a-delimiter functions. Also, a rigirous definition of a pair of delimiters might be nice. I'll look at unicode.org and see if I can find somthing out.

I don't think so; there are only 5 operators that can do this:
q//, qq//, tr///, s///, and m//; however, regular expressions have
different delimeter rules (specifically, they can't use "()"), and are
also parsed different.  I think they should be explained in the regular
expression section.

Is the \qq{} construct a pick-a-delimiter thing? I think it should be, for parallelisim with the qq() operator.

I hope not, {} is much easier to read (and parse).  Anyways, Apoc 2
only mentions {}.  Any delimiter could have been implied, but I just
didn't read it that way.

I thought it was named <<foo bar baz>> or Ťfoo bar bazť or qw(). (That middle one should be U+00AB and U+00BB, \N{LEFT-POINTING DOUBLE ANGLE QUOTATION MARK} and \N{RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK}. Additionaly, I'm fairly certian, the Unicode ops could be either direction. I think there was a reason for that, but I don't remember what.

It was <> in Apoc 2; however, if it changed in a discussion on
perl6-language, I'm unaware of it.

   \t            tab
\U{9}
   \n            newline
\U{10}
   \r            return
\U{13}
   \f            form feed
\U{12}
   \b            backspace
\U{8}
   \a            alarm (bell)
\U{7}
   \e            escape
\U{27}
   \b10        binary char
   \o33        octal char
Is this true? We changed the numeric octal shorthand base to 0c777, so what sense does \o for octal charcters make? (Unfornatly, we can't use \c, since that's taken for control charcters.) IIRC, somebody had mentioned just getting rid of \o altogether. People don't think in octal.

Last time I remember an official decision, it was \o; however, I've
been a bit out of the loop that last couple of days, so I could be
wrong here.

   \x1b        hex char
Specificly, \x must be followed by exactly two hex digits, or do we DWIM with one (IE, if there is only one character in 0-9A-Fa-f after the \x, do we

Do we what? :)
Perl5's semantics are to treat it as a 1 digit number, so \x1 becomes
\x01.  That's good enough for me.

   \N{name}    named Unicode character
Suggested extension: \U{13#ac05} is Unicode character number ac05 in base 13. Any perl expression will do inside the {}s.

I don't think so; any text within a {} should be treated like an
interpolating string.  (or a non-interpolating string, in the case of
\q{} and \Q{})

   \Q{}        Escape all characters that need escaping
               within the current string (except "}")
Escape all characters in [^A-Za-z0-9] within the {}'d part with backslashes. ("that need escaping" is inexact.)

Yeah, you're right.

Within an interpolated string, interpolation of expressions can be
stopped by \Q.

(Which acts somewhat like a non-breaking space.)

<jon_stewart>Say wha?</jon_stewart>

Do you mean that \Q acts zero-width?

The collected standard output of the
command is returned; standard error is unaffected.
Standard error is passed on to the standard error of the perl process? (Or should we leave it at "unaffected", and let the user guess what that means on their OS -- I'm betting I'm being unix-centric here -- OS<=9 has no concept of "standard error" -- or "standard output", for that matter... IIRC, again.)

I don't know; that was straight out of perl 5.8.0's perlop.

In scalar context,
it comes back as a single (potentially multi-line) string, or undef if
the command failed. In list context, returns a list of lines (however
you've defined lines with $/ or $INPUT_RECORD_SEPARATOR), or an empty
list if the command failed.
I don't think $/ still exists, at least as such. In fact, I think we should probably just say "returns an iterator on the standard output of the command", and leave it at that.

You're probably right; I basically copied the entire backquotes section
out of perl5's perlop; I should corrected it more thoroughly, but I
just figured it to be correct.

I think we need a non-optional space to follow the << in the case of double-quotes to disambuilage with <<>> qw lists.

Apoc 2 explicitly states that the space will be optional.

The terminating string must appear by itself, and any preceding or
following whitespace on the terminating line is discarded.
This should probably be a link to somthing defining exactly what whitespace is in perl. I suspect we should follow Unicode's defintion of whitespace -- possibly dissallowing the zero-width whitespace, for sanity reasons. Come to think of it, I think Unicode has both the concept of "whitespace" and of "word-sepperating characters", which aren't neccessarly the same -- zero-width non-breaking space is nonprinting, and doesn't wordbreak, but is whitespace!

Personally, I think this is going a bit overboard; I don't think that
many keyboards have a "zero-width non-breaking space" key, or know of
many text editors that will handle it.  Don't forget that we are
talking about code here, not data.

=head2 Gory Details of parsing quoted constructs
I think this section is going to be very much different -- since the perl6 parser is going to be defined in perl6 regexes, it may just say "see anydelimiter.pl and quoted.pl".

Oh, it already is very different. Take a look at:
/parrot/languages/perl6/P6C/Parser.pm (the quoted_string rule), and its
accompaning explanation in /parrot/languages/perl6/P6C/Tree/String.pm
Note that this package of rules will get much more complicated when
fully complete (and fully accurate), and then again when converted to
regular expression. Lets not make things too difficult, just for the
sake of some obscurities :)

Joseph F. Ryan
[EMAIL PROTECTED]

_________________________________________________________________
Protect your PC - get McAfee.com VirusScan Online http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963

Re: String Literals, take 1

Reply via email to