On Sat, 15 Jul 2023, 10:18 Dennis Williamson, <dennistwilliam...@gmail.com> wrote:
> Would a declared spaces-per-level indent amount be useful? Something like > <<4# that would be the script writer's responsibility to conform to since > they set it. This might be in addition to the already proposed <<#. > I briefly pondered using <<#number#token of <<#token#number, but forbore to suggest it because I was hoping nobody would ask. I consider it a feature rather than a bug if we discourage "dirty" indenting (that doesn't match "^\t* *", because otherwise we need to questions like "if the user designates two-space tabs, and the indicator line specifies that tab+tab+space (or equivalent) should be removed, how should we handle a line that starts with 3 tabs? Should that be rendered as a single space indent of the heredoc content? What about a line that starts with 3 tabs and a space? Is that two spaces or a tab or something else? I think it's less likely to produce arguments (or confusion) if we simply declare that those maverick indents are invalid. So I thought, regardless of whether or not we restrict the indenting to tabs-followed-by-spaces, and whether the indicator line is first or last, I think the lines within the heredoc should be checked for the exact byte sequence from the indicator line. As an alternative, I wonder whether we should go whole hog on normalisation: expand all the tabs to spaces while reading the heredoc within the script, and then compact spaces back into tabs when sending the heredoc as input to the program. Perhaps a syntax like <<#EndMark[#[M][#[N]]] With only one '#' like <<#EOF, all tabs and spaces are stripped from the beginning of each line (like <<-EOF but stripping spaces as well as tabs) before sending it as input to the program. With 2 or 3 '#': <<#EOF# - the heredoc indent is defined as some number of tabs followed by some number of spaces, which are removed from the input to the program. <<#EOF#4 - all whitespace at the start of the line is normalized assuming 4-space tabs; then the requisite number of spaces are removed; then all leading units of 4 spaces are converted back to tabs, and the result sent to the program. <<#EOF#4# - all whitespace at the start of the line is normalized assuming 4-space tabs; then the requisite number of spaces are removed; no leading tabs are sent as input to the program. <<#EOF#8#4 - all whitespace at the start of the line is normalized assuming 8-space tabs; then the requisite number of spaces are removed; then all leading units of 4 spaces are converted back to tabs, and the result sent to the program. I'm in two minds whether I would support the <<#EOF##4 variant where the indentation must match exactly, but the tabstops in the input are normalised. I suspect this would cause more confusion than it would save, and it could be added later if people really think it's useful. To change tack completely, another possibility would be "margins" rather than fixed indents: <<#token#c - remove all leading whitespace followed by the character c, which must be a punctuation mark (other than '#') so that it doesn't conflict with the other forms above. Then we can write: cat <<##| | line 1 indent 2 space |line 2 no indent (No EndMark is necessary; we simply stop when we reach a line that does not start with C.) -Martin