On Tue, Mar 22, 2022, at 9:49 PM, 201009-suckl...@planhack.com wrote:
sed is the canonical paragraph mangler. It's worth spending a bit to grok how that is true.

    tr -d '\r' | sed '/^$/!{H;d;};p;x;s/\n/ /g;'

Gutenberg lines are CRLF-terminated so `tr` is needed.

"Greg Reagle" <l...@speedpost.net> wrote:
Right I forgot to mention that I had to
  tr -d '\r'
first.  Thanks for mentioning that.

Close, but no cigar. That sed command introduces extra blank lines. It is incorrect. ssam reigns supreme!

  tr -d '\r' < 2488-0.txt | ssam -e 'x/\n+/ v/\n\n+/ c/ /' | wc -l
7667
  tr -d '\r' < 2488-0.txt | sed '/^$/!{H;d;};p;x;s/\n/ /g;' | wc -l
7782

Either command is incorrect. ssam will leave a file ending in a single newline ending with a single space and no newline. sed will print empty lines before paragraphs rather than after and not work when the file does not end with an empty line (two newlines).

I'm not sure how to fix the ssam command (but it will probably be more elegant), this should work for sed:

 sed 'H;$!{/^$/!d};x;s/^\n//;s/\n\(.\)/ \1/g;p;d'

Cheers,
Felix

Reply via email to