This revision should be much more thorough and consistant compared to
the last 2, and also incorporates all of the major rulings handed down
by Larry in the last few days.
Remaining Issues:

- Default Object Stringification
(I'd say that defining custom stringification should go in the OO
 section, even if only so we can procrastinate it and move on to
 Apoc 3 soon ;)

- Reference Stringification

- Semantics for \c[]

- Default values for hash and array stringification.

- Names for hash and array stringification properties.


Joseph F. Ryan
[EMAIL PROTECTED]


=pod

=head1 Strings

A literal string is formed when text is enclosed by a quoting
operator; there are two types: interpolating and non-interpolating.
Interpolating constructs insert (interpolate) the value of an
expression into the string in place of themselves. The simplest
examples of the two types of quoting operators are strings delimited
by double (interpolating) and single (non-interpolating) quotes.

Certain characters, known as meta characters, have special
meaning within a literal string. The most basic of these is the
backslash (C<\>), it is special in both interpolated and
non-interpolated strings.  The backslash makes ordinary characters
special and special characters ordinary.  Non-interpolated strings
only have two meta characters, the backslash itself and the character
that is being used as the delimiter.  Interpolated strings have many
more meta characters, see the section on Escaped characters below.

The most basic expression that may be interpolated is a scalar
variable. In non-interpolating constructs, a variable name that
appears within the string is used as-is. For example:

  'The quick brown $animal'
  "The quick brown $animal"

In the first string, perl will take each character in the first string
literally and perform no special processing. However, the value of the
variable $animal is inserted into the second string string in place of
the text $animal. If $animal had had the value "fox", then the second
string would have become "The quick brown fox".

More on the various quoting operators below.

=head2 Non-Interpolating Constructs

Non-Interpolating constructs are strings in which expressions do not
interpolate or expand.  The exception to this rule is that the
backslash character, \, will escape the character that immediately
follows it.

The base form for a non-interpolating string is the single-quoted
string: 'string'.  However, non-interpolating strings can also be formed
with the q[] operator.  The q[] operator allows strings to be made with
any non-space, non-letter, non-digit character as the delimeter instead
of '.  In addition, if the starting delimeter is a part of a paired
set, such as [, <, or {, then the closing delimeter may be the
matching member of the set.  In addition, the reverse holds true;
delimeters which are the tail end of a pair may use the starting item
as the closing delimeter.

Examples:

   $string = 'string'  # $string = 'string'
   $string = q|string| # $string = 'string'
   $string = q{string} # $string = 'string'
   $string = q]string[ # $string = 'string'

There are a few special cases for delimeters; specifically :, ( and #.
: is not allowed because it might be used by custom-defined quoting
operators to apply a attribute.  ( is not allowed because it is used to
pass arguments to attributes.  Finally, # is allowed, but there cannot
be a space between the operator and the #.

=head3 Embedding Interpolated Strings

It is also possible to embed an interpolating string within a non-
interpolating string by the use of the \qq[] construct.  A string
inside a \qq[] constructs acts exactly as if it were an interpolated
string.  Note that any end-brackets, "]", must be escaped within the
the \qq[] construct so that the parser can read it correctly.

Examples ( assuming C<< $var="two" >> ):

   $string = 'one \qq{$var} three'    # $string = 'one two three'
   $string = 'one\qq{ {$var\} }three' # $string = 'one {two} three'

=head3 <<>>; expanding a string as a list.

A set of braces is a special op that evaluates into the list of words
contained, using whitespace as the delimeter.  It is similar to qw[]
from perl5, and can be thought of as roughly equivalent to:
C<< "STRING".split(' ') >>

Examples:

   @array = <one two three>; # @array = ('one', 'two', 'three');
   @array = <one <\> three>; # @array = ('one', '<>', 'three');

=head2 Interpolating Constructs

Interpolating constructs are another form of string in which certain
expressions that are embedded into the string are expanded into their
value at runtime.  Interpolated strings are formed using the double
quote: "string".  In addition, qq[] is a synonym for "", similarly to
q[] being a synoynm for ''.  The rules for interpolation are as
follows:

=head3 Interpolation Rules

=over 3

=item Scalars: C<"$scalar">, C<"$(expression)">
Non-Reference scalars will simply interpolate as their value.  $[]
forces its expression into scalar context, which is then handled as
either a scalar or a reference, depending on how expression evaluates.

=item Lists: C<"@list">, C<"@(expression)">
Arrays and lists are interpolated by joining their list elements by the
list's separator attribute, which is by default a space.  Therefore, the
following two expressions are equivalent:

   print "@list";
   print "" ~ @list.join(@list.separator) ~ "";

=item Hashes: C<"%hash">, C<"%(expression)">
Hashes interpolate by joining its pairs on its .separator attribute,
which by default is a newline.  Pairs stringify by joining the key and
value with the hash's .pairsep attribute, which by default is a space.
Note that hashes are unordered, and so the output will be unordered.
The following two expressions are equivalant:

    print "%hash";
    print "" ~
             join ( %hash.separator,
                  map { $_ ~ %hash.pairsep ~ %hash{$_} } %hash.keys
          ~ "";

=item Subroutines and Methods: C<"&sub($a1,$a2)">, C<"$obj.meth($a)">
Subroutines and Methods will interpolate their return value into the
string, which will be handled in whichever type the return value is.
Same for object methods.  Note that parens B<are> required during
interpolation so that the parser can disambiguate between object
methods and object members.

=item References C<"$ref">
# Behavior not defined

=item Default Object Stringification C<"$obj">
# Behavior not defined

=item Escaped Characters
# Basically the same as Perl5; also, how are locale semantics handled?

   \t            tab
   \n            newline
   \r            return
   \f            form feed
   \b            backspace
   \a            alarm (bell)
   \e            escape
   \0b10        binary char
   \o33        octal char
   \0o33        octal char
   \x33        hex char
   \0x1b        hex char
   \0x[263a]    wide form
   \c[expr]    Named Unicode Character or special notation

=item Modifiers: C<\Q[]>, C<\L[]>, C<\U[]>

Modifiers apply a modification to text which they enclose; they can be
embedded within interpolated strings.

   \l            Lowercase the following character.
   \u            Uppercase the following character.
   \L[]        Lowercase all characters within brackets
   \U[]        Uppercase all characters within brackets
   \Q[]        Escape all non-alphanumerics within
               brackets (except "}")

=back

=head3 Stopping Interpolation (\Q)

Within an interpolated string, perl will always try to take the
longest possible expression to interpolate. For instance this:
C<"@list[0]"> will interpolate element C<0> of the array C<@list>. If
you want perl to include the array C<@list> followed by the string
C<"[0]">, then you need to use the null string (specified by C<\Q>):

Example:
   @list = (1,2);
   print "@list\Q[0]"; # prints '1 2[0]'

=head3 Embedding non-interpolated constructs: C<\q[]>

It is possible to embed a non-interpolated string within an
interpolated string using \q{}. Any characters within the \q{}
construct are treated as if they were in an non-interpolated string.
It is possible to embed a non-interpolated string within a
interpolated string with \q[].  Any characters within a \q[] construct
are treated as if they were in an non-interpolated string.

Example:
   "string \q{$variable}" # $variable will not be interpolated

=head3 C<qx[]>, backticks (C<``>)

A string which is (possibly) interpolated and then executed as a system
command with /bin/sh or its equivalent. Shell wildcards, pipes, and
redirections will be honored. The collected standard output of the
command is returned; standard error is unaffected. In scalar context,
it comes back as a single (potentially multi-line) string, or undef if
the command failed. In list context, returns a of list of lines split
on the standard input separator, or an empty list if the command
failed.

=head2 Special Quoting

=head3 Here-Docs

A line-oriented form of quoting is based on the shell "here-document"
syntax.  Following a << you specify a string to terminate the quoted
material, and all lines following the current line down to the
terminating string are the value of the item.

Also note that with single quoted here-docs, backslashes are not
special, and are taken for a literal backslash, a behaivor that is
different from normal single-quoted strings.  However, \qq[] will
still work.

Examples:

   print << EOF;
   The price is $Price.
   EOF

   print << "EOF"; # same as above
   The price is $Price.
   EOF

   print << "EOF"; # same as above
   The price is $Price.
       EOF

   print << `EOC`; # execute commands
   echo hi there
   echo lo there
   EOC

   print <<"foo", <<"bar"; # you can stack them
   I said foo.
   foo
   I said bar.
   bar

   myfunc(<< "THIS", 23, <<'THAT');
   Here's a line
   or two.
   THIS
   and here's another.
   THAT

Don't forget that you have to put a semicolon on the end to finish the
statement, as Perl doesn't know you're not going to try to do this:

   print <<ABC
   179231
   ABC
   + 20;

If you want your here-docs to be indented with the rest of the code,
you'll need to remove leading whitespace from each line manually:

   ($quote = <<'FINIS') =~ s:e/^^\s+//;
       The Road goes ever on and on,
       down from the door where it began.
   FINIS

If you use a here-doc within a delimited construct, such as in s:e//$()/,
the quoted material must come on the lines following the final
delimiter. So:

   s:e/this/$(<<E ~ 'that'
   the other
   E
   ~ 'more ')/;

you have to write

=over 3
   s:e/this/$(<<E ~ 'that'
   ~ 'more ')/;
   the other
   E
=back

=head3 V-Strings

V-strings are actualy strings that just happen to look like numbers.
Each dot-sepperated number is transformed into the character with
that Unicode ordnal, and the string is concotantaed together.

The transformation from normal string to v-string looks like
   $vstring = 'v' ~ join '.', map {ord} split //, $instring;

the transformation from v-string to normal string looks like
   $normal = join '', map {chr} (m:eS/[ ^v? | ^<before \d> | \. ](\d+)/);

Thus, the following three expressions are equivalent:

   $var = 'Perl 6!';
   $var = v80.101.114.108.32.54.33;
   $var = chr(80)~chr(101)~chr(114)~chr(108)~chr(32)~chr(54)~chr(33)

=head2 Gory Details of parsing quoted constructs

No string section would be complete without a "Gory details of parsing
quoted constructs"; however, since the current implementation in P6C
doesn't have support for \Q, \Q[], \L[], \U[], \N{name}, or \x[], the
implementation may have to change.  If you really need your blood and
guts, please see P6C/Tree/String.pm for the current string-parsing
semantics.

=cut

Reply via email to