Hi Branden, On Thu Nov 14, 2024 at 2:08 AM CET, G. Branden Robinson wrote: > [...] > > It's not so long ago I saw some mentions of support for > > the \[u_...] characters being added to some driver, > > You might be thinking of this: > > commit a6289c1508acf31dce73da2ffa9e7de102986298 > Author: G. Branden Robinson <g.branden.robin...@gmail.com> > Date: Wed Aug 21 08:40:27 2024 -0500 > > font/devps/ZD: Regen from updated dingbats.map. > > * font/devps/ZD: Regenerate using updated dingbats.map. > > Fixes <https://savannah.gnu.org/bugs/?63018>. Thanks to Deri James and > Dave Kemper for (extensive) consultation. > > ...of which part of the commit's diff looks like: > > +u27BA 831,579 3 250 a187 > +u27BB 873,578 3 251 a188 > +u27BC 927,542 3 252 a189 > +u27BD 970,616 3 253 a190 > +u27BE 918,593 3 254 a191
I was actually thinking of this: * GNU troff now performs some limited processing/transformation of the argument to the `\X` escape sequence and its counterpart `device` request, to address the requirement that some documents have to pass metadata that must encode non-ASCII characters in device extension commands. (For example, a document author may desire a document's section headings containing non-ASCII code points to appear correctly in PDF bookmarks. Further, GNU troff encodes its output page description language only in ASCII.) This change is expected to be of significance mainly to developers of output drivers for groff; groff_diff(7) describes the transformations. If you have been using `\X` or `.device` to pass ASCII data to the output driver as a device extension command and require that it remain precisely as-is, use the `\!` escape sequence or `output` request, and prefix your data with "x X ", the device-independent troff means of expressing a device extension command (see groff_out(5)). I remembered it had something to do with asciify and either grops or gropdf, but forgot the rest... > > so I figured it might for some reason be much easier than proper UTF-8 > > support. > > That's a different part of the problem. We can express any Unicode code > point in GNU troff _output_. The reason people say "groff doesn't > support UTF-8" is that GNU troff, the formatter program specifically, > does not correctly interpret UTF-8-encoded input files. > [...] I know, I know. I guess I just don't understand how groff has had Unicode output for so long and yet input is still lacking. To me, adding UTF-8 support to a program in C means changing char to uint32_t and adding conversion from UTF-8 strings to Unicode codepoints to the parts that read data in (i.e. char[1..4] -> uint32_t). I realize groff does some pretty complex text processing and it's C++, but still I wouldn't expect it to be so complex given that both Heirloom troff and neatroff have UTF-8 input support -- and those are essentially one-man projects (especially the latter). > [...] > > Perhaps, but you said it works fine for "temporary disablement with > > `nh`". Disabling hyphenation once and for all does not classify as > > temporary disablement, imho. > > You're kind of confusing me here. Whether changing the line length with > `ll` is "temporary" or not depends on whether you issue a subsequent > request to do so. In _this_ respect, disabling hyphenation is no more > or less permanent than most other operations in troff. > [...] When you say it "works fine for temporar[ily] disabl[ing]" hyphenation, I expect there to be some simple way how I might disable hyphenation and then return it to the exact same state it had before. That's not the case, as we've discussed for a while now. Compare with .na and .ad, which actually DO work fine even though they can be confusing to the beginner. > [...] > It's also okay to ask others. That's one of the reasons this mailing > list is here. Also, occasionally something is hard because troff's > design isn't everything it could be. The sort of things I tend to get stuck on are either: * a complex macro breaking because I made several oversights (or poor decision) when writing it, and their complexity makes fixing this at least an hour long task; these experiences have taught me to make macros as simple as possible and to not try to automate everything (because fixing it is much harder when it breaks) [and as a result I tend to run more into the next one instead...] * basic troff syntax breaking inside my several hundred lines long macro package, but working just fine when I copy it elsewhere; in other words, bugs I can't reproduce separately from the rest of the macro file The latter is worse. I have run multiple times into an if-elsif-else conditional not working correctly within a macro file that's loaded with .so, but working just fine when I paste the macro definition containing it into a new document. I guess if it happens again I might seek your support; so far I never really felt like spending more time messing around with troff's horrible conditionals. > [...] > > My proposal was based on the assumption that maintaining compatibility > > with other troffs is desired. > > I'm concerned mainly with compatibility only with AT&T troff. I see. I have looked at the adjustment/alignment proposal again. It makes sense, although I disagree with the addition of .adjust. It seems unnecessary to me given that .fi doesn't accept a boolean argument either. To me, the changes which allow .ad/.na to be used just like .fi/.nf are enough. Given that these changes make .ad finally true to its mnemonic of "adjust", I would suggest renaming .align to .al because: * it matches the naming scheme used with .ad * it seems more natural given the arguments are single characters: compare .al r with .align r (one would expect .align right) * short names make more sense for basic functions that are expected to be used often such as adjustment, alignment, filling, and various font properties (and all of them currently have them) * even many requests added by groff use aggressively shortened names (.als instead of .alias being the most salient example), so it cannot be argued that long names are somehow preferred Yes, I know I could do .als al align. It's just that I wish I didn't have to type that at the top of each document I write in plain troff. And given how many other basic functions are provided with two letter requests, I don't think making this one easier to remember for beginners would be of much value; they will have to remember all the other ones (or create aliases for them) anyway. > Heirloom Doctools troff and neatroff both came along much later and > I'm not aware that a large corpus of documents has ever been written > specifically for them. [...] There is also Plan 9 troff, which seems to be a descendant of AT&T troff with UTF-8 support. Its changelog makes for a fun read: December 18, 1992: Some people have complete novels as comments, so we need to skip comments while checking the legality of font files. thaks Rixh May 12, 1993: Syntax change Some requests accept tabs as a separator, some don't and this can be a nuisance. Now a tab is also recognized as an argument separator for requests, this makes .so /dev/null works. To be more precise, any motion character is allowed, so .so\h'5i'/dev/null will work as well, if one really wants that. It will be a problem for users who really relied on this as in .ds x string and expect the tab to become part of the string a, but I haven't seen any use of that (obscure trick). ... :) ~ onf