RFC 162 (v2) Heredoc contents

Perl6 RFC Librarian Sun, 01 Oct 2000 17:32:03 -0700
This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Heredoc contents

=head1 VERSION

  Maintainer: Richard Proctor <[EMAIL PROTECTED]>
  Date: 27 Aug 2000
  Last Modified: 1 Oct 2000
  Mailing List: [EMAIL PROTECTED]
  Number: 162
  Version: 2
  Status: Frozen

=head1 ABSTRACT

The content of a Heredoc is normally included into the program verbatim.
RFC 111 allows whitespace (and comments) on the terminator.  This RFC covers
the content.  It introduces the <<< Enhanced Heredoc that removes whitespace
and discusses the provision of other dequoting options in the library and
documentation enhacements that should follow.

=head1 DESCRIPTION

=head2 Preamble

I originally wanted to remove leading whitespace from the lines in a heredoc,
several other people wanted to remove whitespace equivalent to the shortest
span of whitespace at the start of lines, or the whitespace from the first
line.  TomC pointed out ways to achieve the removal of the whitespace in
current perl, although this sort of works (as long as the user is consistent
about use of spaces and tabs).  I would like to make life easier.  This
attempts to bring all these ideas together.

=head2 Discussion of options

There are several possible ways that have been discussed:

a) No Indenting - this is the current behaviour of <<.

b) Remove all leading whitespace from all lines of input.

This was not popular - no longer supported in this RFC.

c) Remove whitespace equivalent to the first line of the Heredoc

This was not popular - it did not fit many peoples requirements.

d) Remove whitespace equialent to the smallest whitespace - a Realistic
option, this can be performed by using regexes and the dequote function.

e) Remove whitespace equialent to the terminator - a realistic option.
This takes the whitespace off the content equivalent to that on the terminator
and removes that amount of whitespace from the content.  (This is now
proposed for <<<).

f) Using a Heredoc and a regex to remove unwanted whitespace.
TomC provided some examples showing how this would work, and howw this could
handle many of the options above.

g) Using a Heredoc and a function to handle the dequoting of the content.
This is essentially the same as a regex, but allows common types of
dequoting to be written once.

=head2 Agreements

There are three things that have been agreed:-

=head3 Enhanced Heredoc

There will be two types of heredocs, the simple <<POD which just includes the 
contents of until the POD terminator and an enhanced <<<POD which removes
whitespace equivalent to that on the terminator from each line of the content
(case e above).  (Note the enhacements to the terminator in RFC 111 apply in
both  cases).

=head3 Distribute a collection of dequote() mutations with perl

These are a set of enhanced dequoting options that can strip of all leading
whitespace with all the options mentioned above, treatement for variable
expansion and perhaps procedure call expansion.  These would be part of the
standard library.  Names and content to be discussed.  

[ NOT as part of this RFC ]

=head3 Mention the s/// tricks in the documentation 

In the discussion that followed this RFC various ways using regexes were
shown that could achieve most of what people want.  Some of these should be
included as examples in the documentation.

=head2 Tabs

Some debate took place on tabs in the whitespace.  There were two
considerations:

a) The problem comes with mixing editors - some use tabs for indented
material some dont, some reduce files using tabs etc etc.  [I move between
too many editors].  Perl should DWIM.  I think that treating tabs=8 as the
default would work for most people, even those who set tabs at other values
as long as they are consistent - a "use tabs 4" could be used by them if they
want to get the same behaviour if they mix tabs and spaces.

b) Tabs are easy, don't expand them.  Consider them as a literal character. 
This assums that the code author is going to use the same keystrokes to
indent their here-doc text as the terminator, about as safe an assumption as
any for tabs.

There was more support for the second case than the first.  

=head2 dequoting example

TomC in the debate provided this example, which works as long as there
are no inconsistent tabs in the whitespace.


        $poem = dequote<<EVER_ON_AND_ON;
               Now far ahead the Road has gone,
                  And I must follow, if I can,
               Pursuing it with eager feet,
                  Until it joins some larger way
               Where many paths and errands meet.
                  And whither then? I cannot say.
                        --Bilbo in /usr/src/perl/pp_ctl.c
        EVER_ON_AND_ON
        print "Here's your poem:\n\n$poem\n";

    The following C<dequote> function handles all these cases.  It
    expects to be called with a here document as its argument.  It
    looks to see whether each line begins with a common substring,
    and if so, strips that off.  Otherwise, it takes the amount of
    leading white space found on the first line and removes that
    much off each subsequent line.

        sub dequote {
            local $_ = shift;
            my ($white, $leader);  # common white space and common leading string
            if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) {
                ($white, $leader) = ($2, quotemeta($1));
            } else {
                ($white, $leader) = (/^(\s+)/, '');
            }
            s/^\s*?$leader(?:$white)?//gm;
            return $_;
        }

=head2 example s/// tricks

     print <<"EOF" =~ /^\s*\| ?(.*\n)/g;
         | Attention criminal slacker, we have yet
         | to receive payment for our legal services.
         |
         |     Love and kisses
         |
     EOF

     print <<FOO =~ /^\s+(.*\n)/g;
             Attention, dropsied weasel, we are
             launching our team of legal beagles
             straight for your scrofulous crotch.

                     xx oo
     FOO


=head1 CHANGES

RFC 162 V2 - Added a lot more material and the conclusions from the list

=head1 IMPLENTATION

This should be a relatively simple addition to perl. 

The <<< would just be to scan_heredoc in toke.c + docs in perl5.

The dequote mutations would be in the standard library.

=head1 REFERENCES

RFC111 - Here Docs Terminators

and lots of discussion on the list with significant input from Micael Schwern,
Tom Christiansen, Eric Roode and others.
RFC 162 (v2) Heredoc contents

Reply via email to