Returning to this issue for a moment before I get back to 1.23 release candidate concerns...
At 2023-04-01T19:45:19-0400, Douglas McIlroy wrote: > The first use of .char that came to mind was > .char \[ntilde] \o'n~' > which would collide badly with the following ancient trick for > unbreakable, unpaddable space. (Ignore the question of whether the > tilde at hand is usable as a diacritical.) > .tr ~ > a~b~c > This, I guess, is typical of the motivation for the change. It enables your "ntilde" use case quoted above, but the proposal was prompted by another that I think I've mentioned but perhaps should pitch more explicitly: to make `tr` translation maps part of the current environment since that seems to comport better with their historical applications. Here's a concrete example, inspired by Kernighan & Cherry's "Typesetting Mathematics -- User's Guide (Second Edition), where the problem you see below swapped in minus signs for hyphens in the page headers. [UTF-8 follows.] $ cat ATTIC/tr-hyphen-to-minus.ms .ds CH * % *\" .LP .\" I get tired of escaping special character escape sequences. .tr *\(mu length * width = area .br price * quantity = extended price .br workers * self-organization = union .sp 60 skill * experience = craft $ nroff -ms ATTIC/tr-hyphen-to-minus.ms| cat -s length × width = area price × quantity = extended price workers × self‐organization = union × 2 × skill × experience = craft People seem to use `tr` either for global changes to a document, where they invoke the request early and never revert it, or for local ones, where they apply it temporarily and then back it out. However, in the second case, they will in general have no idea if a trap will spring before they're done. In groff, you can turn vertical traps off, but you couldn't in AT&T troff. And doing so may have side effects you don't want, like overrunning a column bottom or footnote area, or the page itself. Applying `tr` only to the current environment would accommodate the local use case better at the admitted expense of the global one. This would have to be NEWS-documented. For the sake of historical documents, we could restrict this behavior change to non-compatibility mode, since those are the ones most likely to do what I think is the single most common (albeit historical) global `tr` trick: .tr ~\" nothing ...which turns the tilde into an unadjustable space. In groff it is strictly better to use \~, which is not breakable but _is_ adjustable, or \<space> if you truly do want an unadjustable space. (And you can still perform these translations explicitly, as with .tr ~\~ .) > Suppose the change isn't made? What does .char do for you that .ds > doesn't? A user-defined character can: 1. participate in kerning adjustments; 2. be assigned "character flags" with `cflags`, as Dave noted--these affect how the hyphenation process treats it; 3. be designated as the hyphenation character, tab character, or leader character 4. be `chop`ped off the end of a string atomically; 5. is counted as a single element of a string's contents by the `length` request; and 6. if I implement a `for` request as a string (and other object) iterator as I plan to for groff-next, it will also be atomic in that context. <https://savannah.gnu.org/bugs/?62264> This list may not be exhaustive. A user-defined character cannot be used as the control, no-break control, or escape character. (The last would have obvious circularity problems.) Today I learned that a control character in the ASCII sense, if otherwise valid as groff input, _can_ be used as the *roff control, no-break control, or escape character. And now that I have learned it, I shall do my very best to forget it. I dare not even utter examples for fear of people like "alex ratchev" on the GNU Bash mailing lists getting a hold of them, if he in fact has a troff counterpart. The horror... To tie this back to `tr` and why these are related discussions, I presently understand character definitions to be global-- supra-environmental. I aim to sharpen the distinction between translations and character definitions by retaining character definitions' global application while subordinating translation maps to the environment. > Certainly nothing essential in the example above. However, it > can avoid the ugliness of string invocations. I think there's so much else going on with user-defined characters that conceiving of them as a slightly slicker(?), shorter way of achieving string definitions is a bad idea. > I regard the potential benefit mentioned in the last sentence as > unpersuasive, but the potential catastrophe of the initial example as > tilting the scales toward the proposal. Thanks! Though I fear losing your support with the rest of this context, and would appreciate your further perspectives. At 2023-04-02T08:30:42+0100, Ralph Corderoy wrote: > > tl;dr: For this input: > > > > .tr zx > > .char \(zz zeezee > > \(zz top > > > > Would you want the output to be "zeezee top" or "xeexee top"? > > $ preconv | nroff > .na > .nf > . > .char £ pound sterling > .char $ United States dollar > . > The £ and $ are almost at par. > . > .tr aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ > £ crashes overnight! > . > .pl \n(nlu > ^D > The pound sterling and United States dollar are almost at par. > POUND STERLING CRASHES OVERNIGHT! > $ > > I'd want to see shouty caps. I think this an excellent example of user-defined character abuse. There's no reason not to use strings here. .\" set up for portability .ie \n[.g] \{\ . ie (\n[.x] > 1) .nr use-new-way 1 . el .if (\n[.y] >= 23) .nr use-new-way 1 .\} .ie \n[use-new-way] .als UP stringup .el \{\ . de UP . tr aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ . ds \\$1 \\*[\\$1] . tr AABBCCDDEEFFGGHHIIJJKKLLMMNNOOPPQQRRSSTTUUVVWWXXYYZZ . . .\} .rr use-new-way .\" actual example .na .nf . .ds P pound sterling\" .ds D United States dollar\" . The \*P and \*D are almost at par. . .ds news \*P crashes overnight! .UP news \*[news] .pl \n(nlu The foregoing becomes much shorter--shorter even than your example--if one knows one is targeting groff 1.23 or later. I also went to the trouble of unwinding the translations after using them, since I think that is a fairer representation of a real-world troff document. (K&C did this quite a bit.) Perhaps ironically, if we bind translation maps to environments, then depending on what you use the environment for, you may indeed be able to get away with never reverting them. So they make your proof-of-concept _more_ practical, not less. (Page headers are an obvious application.) At 2023-04-10T05:10:11-0500, Dave Kemper wrote: > Yes, I find it handy to be able to set cflags values on .char-defined > characters. Thanks, Dave--I don't know that I would have thought of that one in the near term. Regards, Branden
signature.asc
Description: PGP signature