This revision should be much more thorough and consistant compared to the last 2, and also incorporates all of the major rulings handed down by Larry in the last few days.
Remaining Issues: - Default Object Stringification (I'd say that defining custom stringification should go in the OO section, even if only so we can procrastinate it and move on to Apoc 3 soon ;) - Reference Stringification - Semantics for \c[] - Default values for hash and array stringification. - Names for hash and array stringification properties. Joseph F. Ryan [EMAIL PROTECTED] =pod =head1 Strings A literal string is formed when text is enclosed by a quoting operator; there are two types: interpolating and non-interpolating. Interpolating constructs insert (interpolate) the value of an expression into the string in place of themselves. The simplest examples of the two types of quoting operators are strings delimited by double (interpolating) and single (non-interpolating) quotes. Certain characters, known as meta characters, have special meaning within a literal string. The most basic of these is the backslash (C<\>), it is special in both interpolated and non-interpolated strings. The backslash makes ordinary characters special and special characters ordinary. Non-interpolated strings only have two meta characters, the backslash itself and the character that is being used as the delimiter. Interpolated strings have many more meta characters, see the section on Escaped characters below. The most basic expression that may be interpolated is a scalar variable. In non-interpolating constructs, a variable name that appears within the string is used as-is. For example: 'The quick brown $animal' "The quick brown $animal" In the first string, perl will take each character in the first string literally and perform no special processing. However, the value of the variable $animal is inserted into the second string string in place of the text $animal. If $animal had had the value "fox", then the second string would have become "The quick brown fox". More on the various quoting operators below. =head2 Non-Interpolating Constructs Non-Interpolating constructs are strings in which expressions do not interpolate or expand. The exception to this rule is that the backslash character, \, will escape the character that immediately follows it. The base form for a non-interpolating string is the single-quoted string: 'string'. However, non-interpolating strings can also be formed with the q[] operator. The q[] operator allows strings to be made with any non-space, non-letter, non-digit character as the delimeter instead of '. In addition, if the starting delimeter is a part of a paired set, such as [, <, or {, then the closing delimeter may be the matching member of the set. In addition, the reverse holds true; delimeters which are the tail end of a pair may use the starting item as the closing delimeter. Examples: $string = 'string' # $string = 'string' $string = q|string| # $string = 'string' $string = q{string} # $string = 'string' $string = q]string[ # $string = 'string' There are a few special cases for delimeters; specifically :, ( and #. : is not allowed because it might be used by custom-defined quoting operators to apply a attribute. ( is not allowed because it is used to pass arguments to attributes. Finally, # is allowed, but there cannot be a space between the operator and the #. =head3 Embedding Interpolated Strings It is also possible to embed an interpolating string within a non- interpolating string by the use of the \qq[] construct. A string inside a \qq[] constructs acts exactly as if it were an interpolated string. Note that any end-brackets, "]", must be escaped within the the \qq[] construct so that the parser can read it correctly. Examples ( assuming C<< $var="two" >> ): $string = 'one \qq{$var} three' # $string = 'one two three' $string = 'one\qq{ {$var\} }three' # $string = 'one {two} three' =head3 <<>>; expanding a string as a list. A set of braces is a special op that evaluates into the list of words contained, using whitespace as the delimeter. It is similar to qw[] from perl5, and can be thought of as roughly equivalent to: C<< "STRING".split(' ') >> Examples: @array = <one two three>; # @array = ('one', 'two', 'three'); @array = <one <\> three>; # @array = ('one', '<>', 'three'); =head2 Interpolating Constructs Interpolating constructs are another form of string in which certain expressions that are embedded into the string are expanded into their value at runtime. Interpolated strings are formed using the double quote: "string". In addition, qq[] is a synonym for "", similarly to q[] being a synoynm for ''. The rules for interpolation are as follows: =head3 Interpolation Rules =over 3 =item Scalars: C<"$scalar">, C<"$(expression)"> Non-Reference scalars will simply interpolate as their value. $[] forces its expression into scalar context, which is then handled as either a scalar or a reference, depending on how expression evaluates. =item Lists: C<"@list">, C<"@(expression)"> Arrays and lists are interpolated by joining their list elements by the list's separator attribute, which is by default a space. Therefore, the following two expressions are equivalent: print "@list"; print "" ~ @list.join(@list.separator) ~ ""; =item Hashes: C<"%hash">, C<"%(expression)"> Hashes interpolate by joining its pairs on its .separator attribute, which by default is a newline. Pairs stringify by joining the key and value with the hash's .pairsep attribute, which by default is a space. Note that hashes are unordered, and so the output will be unordered. The following two expressions are equivalant: print "%hash"; print "" ~ join ( %hash.separator, map { $_ ~ %hash.pairsep ~ %hash{$_} } %hash.keys ~ ""; =item Subroutines and Methods: C<"&sub($a1,$a2)">, C<"$obj.meth($a)"> Subroutines and Methods will interpolate their return value into the string, which will be handled in whichever type the return value is. Same for object methods. Note that parens B<are> required during interpolation so that the parser can disambiguate between object methods and object members. =item References C<"$ref"> # Behavior not defined =item Default Object Stringification C<"$obj"> # Behavior not defined =item Escaped Characters # Basically the same as Perl5; also, how are locale semantics handled? \t tab \n newline \r return \f form feed \b backspace \a alarm (bell) \e escape \0b10 binary char \o33 octal char \0o33 octal char \x33 hex char \0x1b hex char \0x[263a] wide form \c[expr] Named Unicode Character or special notation =item Modifiers: C<\Q[]>, C<\L[]>, C<\U[]> Modifiers apply a modification to text which they enclose; they can be embedded within interpolated strings. \l Lowercase the following character. \u Uppercase the following character. \L[] Lowercase all characters within brackets \U[] Uppercase all characters within brackets \Q[] Escape all non-alphanumerics within brackets (except "}") =back =head3 Stopping Interpolation (\Q) Within an interpolated string, perl will always try to take the longest possible expression to interpolate. For instance this: C<"@list[0]"> will interpolate element C<0> of the array C<@list>. If you want perl to include the array C<@list> followed by the string C<"[0]">, then you need to use the null string (specified by C<\Q>): Example: @list = (1,2); print "@list\Q[0]"; # prints '1 2[0]' =head3 Embedding non-interpolated constructs: C<\q[]> It is possible to embed a non-interpolated string within an interpolated string using \q{}. Any characters within the \q{} construct are treated as if they were in an non-interpolated string. It is possible to embed a non-interpolated string within a interpolated string with \q[]. Any characters within a \q[] construct are treated as if they were in an non-interpolated string. Example: "string \q{$variable}" # $variable will not be interpolated =head3 C<qx[]>, backticks (C<``>) A string which is (possibly) interpolated and then executed as a system command with /bin/sh or its equivalent. Shell wildcards, pipes, and redirections will be honored. The collected standard output of the command is returned; standard error is unaffected. In scalar context, it comes back as a single (potentially multi-line) string, or undef if the command failed. In list context, returns a of list of lines split on the standard input separator, or an empty list if the command failed. =head2 Special Quoting =head3 Here-Docs A line-oriented form of quoting is based on the shell "here-document" syntax. Following a << you specify a string to terminate the quoted material, and all lines following the current line down to the terminating string are the value of the item. Also note that with single quoted here-docs, backslashes are not special, and are taken for a literal backslash, a behaivor that is different from normal single-quoted strings. However, \qq[] will still work. Examples: print << EOF; The price is $Price. EOF print << "EOF"; # same as above The price is $Price. EOF print << "EOF"; # same as above The price is $Price. EOF print << `EOC`; # execute commands echo hi there echo lo there EOC print <<"foo", <<"bar"; # you can stack them I said foo. foo I said bar. bar myfunc(<< "THIS", 23, <<'THAT'); Here's a line or two. THIS and here's another. THAT Don't forget that you have to put a semicolon on the end to finish the statement, as Perl doesn't know you're not going to try to do this: print <<ABC 179231 ABC + 20; If you want your here-docs to be indented with the rest of the code, you'll need to remove leading whitespace from each line manually: ($quote = <<'FINIS') =~ s:e/^^\s+//; The Road goes ever on and on, down from the door where it began. FINIS If you use a here-doc within a delimited construct, such as in s:e//$()/, the quoted material must come on the lines following the final delimiter. So: s:e/this/$(<<E ~ 'that' the other E ~ 'more ')/; you have to write =over 3 s:e/this/$(<<E ~ 'that' ~ 'more ')/; the other E =back =head3 V-Strings V-strings are actualy strings that just happen to look like numbers. Each dot-sepperated number is transformed into the character with that Unicode ordnal, and the string is concotantaed together. The transformation from normal string to v-string looks like $vstring = 'v' ~ join '.', map {ord} split //, $instring; the transformation from v-string to normal string looks like $normal = join '', map {chr} (m:eS/[ ^v? | ^<before \d> | \. ](\d+)/); Thus, the following three expressions are equivalent: $var = 'Perl 6!'; $var = v80.101.114.108.32.54.33; $var = chr(80)~chr(101)~chr(114)~chr(108)~chr(32)~chr(54)~chr(33) =head2 Gory Details of parsing quoted constructs No string section would be complete without a "Gory details of parsing quoted constructs"; however, since the current implementation in P6C doesn't have support for \Q, \Q[], \L[], \U[], \N{name}, or \x[], the implementation may have to change. If you really need your blood and guts, please see P6C/Tree/String.pm for the current string-parsing semantics. =cut