At 2023-09-22T10:56:06+0200, H.Merijn Brand wrote: > Shorted reply. Might expand on this later
No worries. I acknowledge that my emails sometimes resemble homework assignments. > I realized when I re-read that this morning and apologized in my reply > I'll apologize again if that was not clear I'm not upset with you. I get frustrated with software as well. (Frequently.) If someone loses their temper, they should expect to be ignored, reproached, or gently steered back to equanimity. That's just social dynamics. > Thanks for the long and clear answer, which I have to re-read a few > times to get all of the implications. Thanks for the time and effort > you have put into it to clarify all the points. You're welcome. I regard it as part of the job. ;-) > nroff2man added for your amusement. It has never been my intention to > make that public, but feel free to do with it whatever you like. I'll offer you some feedback on it that might make your life easier. > Perl5 uses '~' and '^' quite a lot. '~' as part of '=~' operator is > probably the most widespread use and '^' inside regular expressions. It certainly does. But I did feel the need to point out that the universe of discourse (man pages) is broader than POD. So, nroff2man... > #!/pro/bin/perl > open my $fh, "-|", "nroff", "-mandoc", $nfn; > print map { > s{(?:\x{02dc}|\xcb\x9c )}{~}grx # ~ Okay, it takes me some time to parse perlre, not being a full time Perl programmer. I'll try to decode this for the benefit of self and others. ?: makes a capture group "clustering" instead of "capturing"; in other words, no \1, etc., backreference is produced for the () pair. Braces following \x appear to admit (arbitrarily?) long hexadecimal code points instead of the byte-oriented \xXX syntax. Braces are also used here instead of more traditional delimitation[1] where opening and closing delimiters are identical. Finally we see 'r' and 'x' options on the replacement. 'r' performs "non-destructive substitution"--not sure if/how that applies here, and 'x' treats much whitespace as discardable, for readability. Thus, what the above does is replace U+02DC, whether encoded as UTF-32 or UTF-8, with an ASCII tilde. > =~ s{(?:\x{02c6}|\xcb\x86 )}{^}grx # ^ Similar: circumflex accent -> caret. > =~ s{(?:\x{2018}|\xe2\x80\x98 > |\x{2019}|\xe2\x80\x99 )}{'}grx # ' Similar: right and left single quotation marks -> neutral apostrophe. > =~ s{(?:\x{201c}|\xe2\x80\x9c > |\x{201d}|\xe2\x80\x9d )}{"}grx # " Similar: right and left double quotation marks -> neutral double quote. > =~ s{(?:\x{2212}|\xe2\x88\x92 > |\x{2010}|\xe2\x80\x90 )}{-}grx # - Similar: minus sign -> hyphen-minus. > =~ s{(?:\e\[|\x9b)[0-9;]*m} {}grx # colors Very different. Attempts to match some forms of ECMA-48 escape sequence and remove them. You can prevent these from showing up in the input in the first place by passing the '-c' flag to nroff(1). That runs groff(1) and ultimately grotty(1) with the same option. See grotty(1). On the other hand that resorts to overstriking, which you also might not want. In that case, you can tell grotty(1) to shut off _all_ attempts to represent style changes. $ nroff -mandoc -P -cbou nroff's support for `-P` is new in groff 1.23.0. Regards, Branden [1] nonce word?
signature.asc
Description: PGP signature