Follow-up Comment #21, bug #66392 (group groff): Hi Peter & Dave,
At 2025-02-01T12:56:28-0500, Peter Schaffter wrote: > Why is \n[.hla] not global regardless of ev? It seems an eminently > reasonable expectation that a document's hyphenation language will > apply throughout the whole document. I can only think of edge cases > where one might want to switch hla's, e.g. a document in French with a > formatted blockquote in Italian. That is the sort of scenario I was thinking of. But what motivated the change is the fact that the hyphenation mode itself is not global, but a property of the environment. I _think_ this is true all the way back to Ossanna troff but it's tedious to verify that fact, as the value of the hyphenation mode is not introspectable except via a GNU extension. (You'd have to infer it by formatting text and seeing if the placement of the hyphenation breaks changed from one environment to another, given comparable inputs.) > Having to explicitly instantiate .hla for every .ev that doesn't call > .evc 0 makes no sense. I disagree--here's why. I think a lot of people assume that when they create a new environment, it's a copy of environment 0 already. But it isn't. It's a copy of the _formatter_'s default environment, meaning, in practice, it has the attributes that correspond go the way its C++ object was constructed when the formatter started up. This is an implementation detail in capital letters. At a more practical level, that "formatter's default" environment is not affected by anything that happens in the "troffrc" and "troffrc-end" files. To be fair, most of what the _stock_ startup files (and each of the several files it macro-sources) do alters only global state.[1] "troffrc" itself sets a register, defines (and then removes) some strings, and sets up blank-line and leading-space traps (for diagnostic purposes, which "troffrc-end" later removes). What about the macro files "troffrc" loads? "composite.tmac" sets up a handful of composite character mappings. These are global. "fallbacks.tmac" creates user-defined characters. Also global. An output-driver-specific macro file is loaded. These generally do things like define more characters. Some assign hyphenation codes (global, but one should feel a twitch here[2]). Some define color names (global). "pdf.tmac" defines boatloads of macros (global). A localization macro file is loaded. By default, it's the one for English, but we do encourage sites to alter this if they wish. The localization macro file itself loads an encoding macro file, which sets up input character translations (`trin` requests) and more hyphenation codes (twitch). The localization macro file then goes on to configure the inter-sentence spacing amount (environmental), set up a default hyphenation mode (environmental), and select that mode (environmental). It sets the hyphenation language (formerly global, now environmental), and loads hyphenation pattern files (global, but a separate dictionary for each hyphenation language code is maintained--so until/unless we support maintenance of multiple sets of hyphenation patterns for a given language code,[3] I figure this looks as good as environmental to the user). Finally, for convenience, and depending on the output device, "pdfpic.tmac" or "pspic.tmac" might get loaded. These do only global stuff, mainly defining namesake (albeit fully capitalized) macros. The rug may not be pulled yet, but the dog is tugging at a corner of it. Here's the rug pull. Because we advocate site-local customization of "troffrc" and "troffrc-end", there's simply no way for us know of or prevent the user from putting all kinds of environment-altering stuff in them. They might choose an adjustment mode. They might override the line length. Change the page offset. Alter the type size. Here's the output of the `pev` request from bleeding-edge GNU troff. Current Environment: previous type size: 10p (10000s) type size: 10p (10000s) previous requested type size: 10000s requested type size: 10000s valid type size list for selected font: 1000s-10000000s previous default family: 'T' default family: 'T' previous font selection: 1 ('TR') font selection: 1 ('TR') space size: 12/12 of font space width sentence space size: 12/12 of font space width previous line length: 468000u line length: 468000u previous title line length: 468000u title line length: 468000u previous line interrupted/continued: no filling: on alignment/adjustment: both previous vertical spacing: 12000u vertical spacing: 12000u previous post-vertical spacing: 0u post-vertical spacing: 0u previous line spacing: 1 line spacing: 1 previous indentation: 0u indentation: 0u temporary indentation: 0u temporary indentation pending: no total indentation: 0u previous text length: 0u target text length: 0u input line start: 0u computing tab stops from: input line start forcing adjustment: no hyphenation language code: en hyphenation mode: 4 (on, not allowed within last two characters) hyphenation mode default: 4 count of consecutive hyphenated lines: 0 consecutive hyphenated line count limit: -1 (unlimited) hyphenation space: 0u hyphenation margin: 0u Environment 0: current And it's stuff they _won't get_ automatically when creating a new environment. Our documentation should probably urge the user more strongly to, as a rule, `evc 0` when creating an environment. All of that said, we _could_ change `ev` to, when creating a new environment, copy from environment `0` automatically. (I'm not sure how we would represent a desire to copy the formatter's default environment though. I hope not with yet another new request.) But that seemed like a more disruptive and less backward-compatible change. I think that if people have been creating environments and _not_ using `evc 0` on them immediately afterward, they've been relying on luck. At 2025-02-01T14:13:16-0500, Dave wrote: > Follow-up Comment #19, bug #66392 (group groff): > > [comment #18 comment #18:] >> Why is \n[.hla] not global regardless of ev? > > By my reading of bug #66387, the salient sentence is, "Pretty weird to > pop the environment stack and have the hyphenation mode, but not the > hyphenation _language_, change." > > But perhaps this is something that warrants wider discussion. That could be; I don't mind. But the status quo ante did not look to me like a situation anyone would expect or desire. Hmm, I do see that I missed an opportunity to post one of my "trivia challenges" about it to the list. ;-) Regards, Branden [1] The stock "troffrc" performs one character translation involving the non-breaking space. Character translations are presently global but I have a notion to make those environmental as well, to avoid a problem seen in the real world where a set of translations temporarily set up happens to be in force when a page break happens, corrupting header and/or footer text. I don't have this work scheduled. Like the item in the next footnote, it will demand major surgery to reorganize data structures. [2] It hadn't occurred to me before now that we might need to house the hyphenation code assignments in the environment instead. Doing so will require some significant refactoring, as presently a character's hyphenation code is stored in its `charinfo` object, the dictionary of which is global. I have no appetite to add this to my groff 1.24 plate. [3] And I don't know why we would; if we ever need to distinguish "en_GB" from "en_US", for example (which really do hyphenate a few words differently, I gather), those strings will _be_ the hyphenation language codes, and they obviously differ. Similarly, if a user wants to set up their own multiple distinct hyphenation configurations for a given language, they'd pick distinct identifiers (language codes) for them. _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?66392> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/
signature.asc
Description: PGP signature