This and other RFCs are available on the web at http://dev.perl.org/rfc/ =head1 TITLE Heredoc contents =head1 VERSION Maintainer: Richard Proctor <[EMAIL PROTECTED]> Date: 27 Aug 2000 Last Modified: 1 Oct 2000 Mailing List: [EMAIL PROTECTED] Number: 162 Version: 2 Status: Frozen =head1 ABSTRACT The content of a Heredoc is normally included into the program verbatim. RFC 111 allows whitespace (and comments) on the terminator. This RFC covers the content. It introduces the <<< Enhanced Heredoc that removes whitespace and discusses the provision of other dequoting options in the library and documentation enhacements that should follow. =head1 DESCRIPTION =head2 Preamble I originally wanted to remove leading whitespace from the lines in a heredoc, several other people wanted to remove whitespace equivalent to the shortest span of whitespace at the start of lines, or the whitespace from the first line. TomC pointed out ways to achieve the removal of the whitespace in current perl, although this sort of works (as long as the user is consistent about use of spaces and tabs). I would like to make life easier. This attempts to bring all these ideas together. =head2 Discussion of options There are several possible ways that have been discussed: a) No Indenting - this is the current behaviour of <<. b) Remove all leading whitespace from all lines of input. This was not popular - no longer supported in this RFC. c) Remove whitespace equivalent to the first line of the Heredoc This was not popular - it did not fit many peoples requirements. d) Remove whitespace equialent to the smallest whitespace - a Realistic option, this can be performed by using regexes and the dequote function. e) Remove whitespace equialent to the terminator - a realistic option. This takes the whitespace off the content equivalent to that on the terminator and removes that amount of whitespace from the content. (This is now proposed for <<<). f) Using a Heredoc and a regex to remove unwanted whitespace. TomC provided some examples showing how this would work, and howw this could handle many of the options above. g) Using a Heredoc and a function to handle the dequoting of the content. This is essentially the same as a regex, but allows common types of dequoting to be written once. =head2 Agreements There are three things that have been agreed:- =head3 Enhanced Heredoc There will be two types of heredocs, the simple <<POD which just includes the contents of until the POD terminator and an enhanced <<<POD which removes whitespace equivalent to that on the terminator from each line of the content (case e above). (Note the enhacements to the terminator in RFC 111 apply in both cases). =head3 Distribute a collection of dequote() mutations with perl These are a set of enhanced dequoting options that can strip of all leading whitespace with all the options mentioned above, treatement for variable expansion and perhaps procedure call expansion. These would be part of the standard library. Names and content to be discussed. [ NOT as part of this RFC ] =head3 Mention the s/// tricks in the documentation In the discussion that followed this RFC various ways using regexes were shown that could achieve most of what people want. Some of these should be included as examples in the documentation. =head2 Tabs Some debate took place on tabs in the whitespace. There were two considerations: a) The problem comes with mixing editors - some use tabs for indented material some dont, some reduce files using tabs etc etc. [I move between too many editors]. Perl should DWIM. I think that treating tabs=8 as the default would work for most people, even those who set tabs at other values as long as they are consistent - a "use tabs 4" could be used by them if they want to get the same behaviour if they mix tabs and spaces. b) Tabs are easy, don't expand them. Consider them as a literal character. This assums that the code author is going to use the same keystrokes to indent their here-doc text as the terminator, about as safe an assumption as any for tabs. There was more support for the second case than the first. =head2 dequoting example TomC in the debate provided this example, which works as long as there are no inconsistent tabs in the whitespace. $poem = dequote<<EVER_ON_AND_ON; Now far ahead the Road has gone, And I must follow, if I can, Pursuing it with eager feet, Until it joins some larger way Where many paths and errands meet. And whither then? I cannot say. --Bilbo in /usr/src/perl/pp_ctl.c EVER_ON_AND_ON print "Here's your poem:\n\n$poem\n"; The following C<dequote> function handles all these cases. It expects to be called with a here document as its argument. It looks to see whether each line begins with a common substring, and if so, strips that off. Otherwise, it takes the amount of leading white space found on the first line and removes that much off each subsequent line. sub dequote { local $_ = shift; my ($white, $leader); # common white space and common leading string if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) { ($white, $leader) = ($2, quotemeta($1)); } else { ($white, $leader) = (/^(\s+)/, ''); } s/^\s*?$leader(?:$white)?//gm; return $_; } =head2 example s/// tricks print <<"EOF" =~ /^\s*\| ?(.*\n)/g; | Attention criminal slacker, we have yet | to receive payment for our legal services. | | Love and kisses | EOF print <<FOO =~ /^\s+(.*\n)/g; Attention, dropsied weasel, we are launching our team of legal beagles straight for your scrofulous crotch. xx oo FOO =head1 CHANGES RFC 162 V2 - Added a lot more material and the conclusions from the list =head1 IMPLENTATION This should be a relatively simple addition to perl. The <<< would just be to scan_heredoc in toke.c + docs in perl5. The dequote mutations would be in the standard library. =head1 REFERENCES RFC111 - Here Docs Terminators and lots of discussion on the list with significant input from Micael Schwern, Tom Christiansen, Eric Roode and others.