Hi folks, I've started down a road long contemplated.
https://savannah.gnu.org/bugs/index.php?65724 Per discussion with Mike Fulton of IBM over a year ago, and hearing no contradiction from anyone in the interim, I aim to drop EBCDIC a.k.a. code page (CCSID) 1047 support from groff 1.24. I've changed the default startup files in groff Git to no longer load _either_ cp1047.tmac _or_ latin1.tmac. The localization files (fr.tmac, de.tmac, etc.) that require support for ISO Latin-X (or KOI8-R) code points for now continue to load the appropriate macro files ({latin[1259],koi8-r}.tmac). But those will probably go away sooner or later. The idea is, for 1.24, to get everybody migrating to pure ASCII input documents (as might be generated by preconv(1)) by the time GNU troff sees them. Recall that preconv is a preprocessor, and has dedicated groff(1) flags to make it run, so people can still _maintain their source documents_ in ISO Latin-X or KOI8-R. But somebody who has been composing in English and mostly Basic Latin with the occasional Latin-1 character sprinkled in will stop getting the output they expect. $ printf 'You are painfully na\357ve.\n' \ | ~/groff-stable/bin/troff -z 2>&1 | grep . || echo NO OUTPUT NO OUTPUT $ printf 'You are painfully na\357ve.\n' \ | ~/groff-HEAD/bin/troff -z 2>&1 | grep . || echo NO OUTPUT /.../groff-HEAD/bin/troff:<standard input>:1: warning: character with input code 239 not defined One way to check one's input documents to see if they'll have trouble is to run file(1) on them. $ printf 'You are painfully na\357ve.\n' >naive.latin1.txt $ printf 'You are painfully na\\[i ad]ve.\n' >naive.ascii.txt $ file naive.* naive.ascii.txt: ASCII text naive.latin1.txt: ISO-8859 text "ISO-8859 text" will be contraindicated for groff 1.24. This achieved, we can further modify GNU troff to accept and expect UTF-8 input directly for groff 1.25. And that's, like, in the Mission Statement or something, which is now 10 years old. The foregoing will require a NEWS item I haven't written yet. I expect I won't know everything it needs to say until I've finished the outripping. If someone strongly objects, please speak up soon, with a viable alternative path to GNU troff's recognition of UTF-8, before I do more radical code deletion. Regards, Branden
signature.asc
Description: PGP signature