On Tue, Mar 22, 2022, at 9:49 PM, 201009-suckl...@planhack.com wrote:
sed is the canonical paragraph mangler. It's worth spending a bit to
grok how that is true.
tr -d '\r' | sed '/^$/!{H;d;};p;x;s/\n/ /g;'
Gutenberg lines are CRLF-terminated so `tr` is needed.
"Greg Reagle" <l...@speedpost.net> wrote:
Right I forgot to mention that I had to
tr -d '\r'
first. Thanks for mentioning that.
Close, but no cigar. That sed command introduces extra blank lines.
It is incorrect. ssam reigns supreme!
tr -d '\r' < 2488-0.txt | ssam -e 'x/\n+/ v/\n\n+/ c/ /' | wc -l
7667
tr -d '\r' < 2488-0.txt | sed '/^$/!{H;d;};p;x;s/\n/ /g;' | wc -l
7782
Either command is incorrect. ssam will leave a file ending in a single
newline ending with a single space and no newline. sed will print
empty lines before paragraphs rather than after and not work when the
file does not end with an empty line (two newlines).
I'm not sure how to fix the ssam command (but it will probably be more
elegant), this should work for sed:
sed 'H;$!{/^$/!d};x;s/^\n//;s/\n\(.\)/ \1/g;p;d'
Cheers,
Felix