Update of bug #63985 (project groff): Status: None => Postponed Assigned to: None => gbranden
_______________________________________________________ Follow-up Comment #3: [comment #1 comment #1:] > There's no differentiation between input and output in that snippet, so in case anyone is confused by it, Branden must have typed a ^D after the .pl line. Yep--I pasted my shell session and moved on without thinking much about readability. Whoops! [comment #2 comment #2:] > This problem is not limited to characters in the ASCII range; it seems to apply to any Latin-1 (groff's native input encoding) character. (The following uses a Latin-1-encoded input file and a Latin-1 output environment.) > $ cat rchar_test > .nf > äbc > .rchar ä > äbc > .pl \n(nlu > $ nroff -ww rchar_test > äbc > äbc I think this is because the printable characters in the Unicode Latin-1 supplement (U+00A0..U+00FF) are first-class citizens to groff. (Because CCSID ["code page"] 1047 is a rearrangement of ISO 8859 Latin-1, and because GNU troff is compiled expecting one or the other as its input encoding, the same characters are first-class citizens in it despite their different code points.) The planned (but unscheduled) migration to accept UTF-8 input will abandon that support in favor of being able to interpret UTF-8 multiple sequences. Anyway, as a bit of status, I hit an impediment to implementing this. Almost everything in the tree is fine with it; all but one automated test passes. The exception is something internal to the mom(7) package which attempts to remove a _whole bunch_ of ordinary ASCII/Basic Latin characters. So this is on hold pending my exploration of mom internals and a discussion with Peter Schaffter over alternative solutions or whether, in fact, what mom is doing today should block this change. _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?63985> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/