Re: [PATCH 1/2] <<# indent-stripping heredoc

Martin D Kealey Tue, 18 Jul 2023 00:06:17 -0700

On Sat, 15 Jul 2023, 10:18 Dennis Williamson, <dennistwilliam...@gmail.com>
wrote:


> Would a declared spaces-per-level indent amount be useful? Something like
> <<4# that would be the script writer's responsibility to conform to since
> they set it. This might be in addition to the already proposed <<#.
>

I briefly pondered using <<#number#token of <<#token#number, but forbore to
suggest it because I was hoping nobody would ask.

I consider it a feature rather than a bug if we discourage "dirty"
indenting (that doesn't match "^\t* *", because otherwise we need to
questions like "if the user designates two-space tabs, and the indicator
line specifies that tab+tab+space (or equivalent) should be removed, how
should we handle a line that starts with 3 tabs? Should that be rendered as
a single space indent of the heredoc content? What about a line that starts
with 3 tabs and a space? Is that two spaces or a tab or something else? I
think it's less likely to produce arguments (or confusion) if we simply
declare that those maverick indents are invalid. So I thought, regardless
of whether or not we restrict the indenting to tabs-followed-by-spaces, and
whether the indicator line is first or last, I think the lines within the
heredoc should be checked for the exact byte sequence from the indicator
line.

As an alternative, I wonder whether we should go whole hog on
normalisation: expand all the tabs to spaces while reading the heredoc
within the script, and then compact spaces back into tabs when sending the
heredoc as input to the program. Perhaps a syntax like
<<#EndMark[#[M][#[N]]]

With only one '#' like <<#EOF, all tabs and spaces are stripped from the
beginning of each line (like <<-EOF but stripping spaces as well as tabs)
before sending it as input to the program.

With 2 or 3 '#':

  <<#EOF# - the heredoc indent is defined as some number of tabs followed
by some number of spaces, which are removed from the input to the program.

  <<#EOF#4 - all whitespace at the start of the line is normalized assuming
4-space tabs; then the requisite number of spaces are removed; then all
leading units of 4 spaces are converted back to tabs, and the result sent
to the program.
  <<#EOF#4# - all whitespace at the start of the line is normalized
assuming 4-space tabs; then the requisite number of spaces are removed; no
leading tabs are sent as input to the program.
  <<#EOF#8#4 - all whitespace at the start of the line is normalized
assuming 8-space tabs; then the requisite number of spaces are removed;
then all leading units of 4 spaces are converted back to tabs, and the
result sent to the program.

I'm in two minds whether I would support the <<#EOF##4 variant where the
indentation must match exactly, but the tabstops in the input are
normalised. I suspect this would cause more confusion than it would save,
and it could be added later if people really think it's useful.

To change tack completely, another possibility would be "margins" rather
than fixed indents:

  <<#token#c - remove all leading whitespace followed by the character c,
which must be a punctuation mark (other than '#') so that it doesn't
conflict with the other forms above.
Then we can write:

 cat <<##|
   |  line 1 indent 2 space
   |line 2 no indent

(No EndMark is necessary; we simply stop when we reach a line that does not
start with C.)

-Martin

Re: [PATCH 1/2] <<# indent-stripping heredoc

Reply via email to