Re: String Literals, take 2

Joseph F. Ryan Mon, 02 Dec 2002 13:43:36 -0800

James Mastros wrote:

Just a few more nits to pick...


On 12/02/2002 6:58 AM, Joseph F. Ryan wrote:

The q() operator allows strings to be made with
any non-space, non-letter, non-digit character as the delimeter instead
of '.  In addition, if the starting delimeter is a part of a paired
set, such as (, [, <, or {, then the closing delimeter may be the
matching member of the set.  In addition, the reverse holds true;
delimeters which are the tail end of a pair may use the starting item
as the closing delimeter.

We need to decide if this is a user doc or a developer doc/language specification. If it's the later, we need a regirous defintion of what a pair is.


I'm more inclined towards a user doc; a rigorous definition of pairs in
the tests should be good enough for the developers.

There are a few special cases for delimeters; specifically : and #.
: is not allowed because it might be used by custom-defined quoting
operators to apply a property; # is allowed, but there cannot be a
space between the operator and the #.  In addition, comments are not
allowed within # delimeted expressions (for obvious reasons).

Are comments ever allowed within q() constructs? If not, ditch the statement about comments not being allowed in q## constructs.


You're right, they're not.  Woops.

=head3 <<>>; expanding a string as a list.

A set of braces is a special op that evaluates into the list of word
A doubled set of angle brackets (<<text here>>) or a set of double-angle quotation marks (guillemets, Ťtext hereť).
contained, using whitespace as the delimeter.  It is similar to qw()
from perl5, and can be thought of as roughly equivalent to:
Are we getting rid of qw()? I assumed that we were keeping it as a longhand form of <<>>/guillemets, just like qq() is the longhand form of "".
C<< "STRING".split(' ') >>
I'd be more explicit here, and say C<<"STRING".split(/\s+/)>>. (The two are equivlent, but only because of special-casing; the second is more explicit.)


Nope, split (' ', $string) is special; it eats up all preceding
whitespace before splitting on the space, while with /\s+/ there
will be an intial empty element.  The example is straight from
perl5's perlop anyways :)

Have these defaults been defined somewhere? I'd rather see them be ', ' and '=>' by default...


Well, that's what the RFC suggested, and there didnt seem
to be many complaints about the defaults in the Apoc
(besides the variable names)  Like I said, I just winged it :)

Note that hashes are unordered, and so the output will be unordered.
Therefore, the following two expressions are equivalant:
Get rid of the therefore; it seems to refer to the preceding sentance, which has nothing to do with the example.
=item Subroutines and Methods: C<"&sub($a1,$a2)">, C<"$obj.meth($a)">
Subroutines and Methods will interpolate their return value into the
string, which will be handled in whichever type the return value is.
Same for object methods.  Note that parens B<are> required during
interpolation so that the parser can disambiguate between object
methods and object members.
Has this been vetted? $(...)/etc seem to cover this case, and & being a qq() metachar makes using qq() strings to print HTML/XML difficult.

Well, it was in Apoc 2:
http://www.perl.com/pub/a/2001/05/03/wall.html#rfc 252: interpolation of subroutines
http://www.perl.com/pub/a/2001/05/03/wall.html#rfc 222: interpolation of object method calls

=item Escaped Characters
# Basically the same as Perl5; also, how are locale semantics handled?

   \t            tab
   \n            newline
   \r            return
   \f            form feed
   \b            backspace
   \a            alarm (bell)
   \e            escape
Can we get some riggor here? Also, is \n the same everwhere, or do we play the same tricks we did with it in p5? (I think it should be the same everywhere, a CR char, "\cM". Disciplines, or encodings, or whatever we're calling them, can take care of it on IO.) Oh, and it might be nice for \0 to be NUL. (This used to be implicit with \0 as octal, but since \0 isn't octal anymore...)


As someone who has had to use NT, Mac OS 9, and Solaris with much
frequency, I can say I very much appreciated the special tricks
that \n did (does).

   \b10        binary char
   \o33        octal char
Numeric Literals, take 3 (http:[EMAIL PROTECTED]/msg00462.html), in the "*** Bin/Hex/Oct shorthands" section, gives 0c123 as the shorthand form of octal numbers, so it doesn't make much sense for octal character constants to be \o123. Do we want to change shorthand octal literal numbers to 0o123 (I don't like this, it's hard to read), change octal chars to \c123 (can't do this without getting rid of, or changing, \c for control-character), get rid of octal chars entirely, or somthing else? (Baring a good "somthing else", I vote for killing octal chars.)

This seems to be going back and forth:

$octal_format = ($octal_format_still_exists) ?
sprintf("\\%s%d",$octals_current_letter_of_the_week, $number) :
undef;
That should clear things up.

   \x1b        hex char
Exactly two digits after the \x? Perl5 attempts to do the right thing either way, but this can be confusing too -- "\xA" eq chr(0xA), "\xABar" eq chr(0xAB)."ar", "\xAQux" eq chr(0xA)."Qux".


That was in perl5's perldoc, so I assume it is encouraged.

You brought this up before:
http:[EMAIL PROTECTED]/msg00485.html

I still say to stick with perl5's behavior.

   \x{263a}    wide hex char
   \c[            control char

Rigor?  What is \c~?  perl5 thinks it's >, should perl6 agree?


I don't see why it shouldn't.

How about \c\x{1000} (that's invalid, but you get the point), is that equiv to \x{ff9c}?


No, its "\c\" ~ "x{1000}"

What about \cé, (e+acute accent), does that capitalize, then subtract 64, or just subtract?
   \N{name}    named Unicode character
Reference to charnames pragmata, or however we end up defining the exact semantics of \N. (Since we don't know yet, just put in a FIXME, I suppose.)

Just recycle perl5's, I suppose. Not *everything* needs to be redone from scratch.

Is there any way to give the ordnal in decimal, like "\d192"? (I'm not sure how useful this would be, but it would be nice parrellelisim. OTOH, you can use chr() easily enough.

That is a good point; if there is a 0dxxxxx, then there should be a "\dxxxxx".

=item Modifiers: C<\Q{}>, C<\L{}>, C<\U{}>

Modifiers apply a modification to text which they enclose; they can be
embedded within interpolated strings.

   \L{}        Lowercase all characters within brackets
   \U{}        Uppercase all characters within brackets
   \Q{}        Escape all characters that need escaping
               within brackets (except "}")

Rigor: escape all non-alphanumerics.
Do we still have the other modifiers that p5 supports, \l and \u?


That's a good question.  There was no reference to them in Apoc,
however, that doesn't mean that they are gone.  I haven't a clue,
really.

Do we want a new titlecase modifier, \T{james mastros} eq "James Mastros", doing the Right Thing for other languages, where it isn't so simple (there are complicated cases for this, but IIRC Unicode defines a robust algo to do this). I'll check on the Unicode stuff if anybody thinks it's a good idea... I'm uncertian, myself, I never liked the qq() case-modifers, so don't use them.


There is ucfirst(), which I'm sure could be updated to handle Unicode;
however, I don't know if it is important enough to deserve \T{}.  You
might want to ask Larry :)

A string which is (possibly) interpolated and then executed as a system
command with /bin/sh or its equivalent.   Shell wildcards, pipes, and
redirections will be honored. The collected standard output of the
command is returned; standard error is unaffected. In scalar context,
it comes back as a single (potentially multi-line) string, or undef if
the command failed. In list context, returns a of list of lines split
on the standard input separator, or an empty list if the command
failed.
This whole section is very unix-centric, but I'm not certian what to do about that -- the functionality is very system-specifc. Also, I suspect we're going to want to rewrite it anyway when we hammer out iterators, files, and context.


Why?

A line-oriented form of quoting is based on the shell "here-document"
s/shell/unix borne shell/
syntax.  Following a << you specify a string to terminate the quoted
material, and all lines following the current line down to the
terminating string are the value of the item. The terminating string
may be either an identifier (a word), or some quoted text. If quoted,
the type of quotes you use determines the treatment of the text, just
as in regular quoting. An unquoted identifier works like double quotes.
The terminating string must appear by itself, and any preceding or
following whitespace on the terminating line is discarded.
I could have sworn that Larry recently put somthing out about the edge cases between << heredoc and << beginning-of-qw. I /think/ he said that qw("Foo" bar) must be written as << "Foo" bar>>, because otherwise it would be interpreted as a here-doc ending with Foo with double-quote interpolation. Can anybody find this, or is Larry watching?
Also note that with single quoted here-docs, backslashes are not
special, and are taken for a literal backslash, a behaivor that is
different from normal single-quoted strings.
Are \qq()s still special, even in <<'noninterpolating's? Either way, it should be explicitly noted.


As far as I know, *nothing* is special in a single quoted heredoc.

V-Strings are formed when 3 or digits are joined by decimal points,
with a possible leading v.  The resulting item is then treated like
a string, rather than a number.

=over 3
Examples:
 $var = v5.8.0; # $var = "5.8.0";
 $var = 192.168.0.1; # $var = "192.168.0.1";
=back

Note that the v is non-optional for two-character v-strings.


Good point, because otherwise its a number.  Definately
needs to be added to the test suite.

I'd say somthing like:
V-strings are actualy strings that just happen to look like numbers. Each dot-sepperated number is transformed into the character with that Unicode ordnal, and the string is concotantaed together.

(The transformation from normal string to v-string looks like C<<$vstring='v' ~ join '.', map {ord} split //, $instring>>; the transformation from v-string to normal string looks like
C<<print join '', map {chr} split /\./, $vstring>>;
(Where vstring cannot begin with a leading 'v', for purposes of illistration.))

Thus, C<<80.101.114.108.32.54.33 eq 'Perl 6!'>>

Also, your examples are misleading at best. v5.8.0 eq "\x05\x08\x00".
192.168.0.1 eq chr(192)~chr(168)~chr(0)~chr(1).


You're right, the vstring section should be totally redone.

Thanks for the feedback.,

Joseph F. Ryan
[EMAIL PROTECTED]

Re: String Literals, take 2

Reply via email to