Hi guys, Having been a terminal emulator developer for some years now, I have to say – perhaps surprisingly – that I don't fancy the idea of reusing escape sequences of the terminal world.
(Mind you, I don't find it a good idea to add italic and whatnot formatting support to Unicode at all... but let's put aside that now.) There are a lot of problems with these escape sequences, and if you go for a potentially new standard, you might not want to carry these problems. There is not a well-defined framework for escape sequences. In this particular case you might say it starts with ESC [ and ends with the letter 'm', but how do you know where to end the sequence if that letter 'm' just doesn't arrive? Terminal emulators have extremely complex tables for parsing (and still many of them get plenty of things wrong). It's unreasonable for any random small utility processing Unicode text to go into this business of recognizing all the well-known escape sequences, not even to the extent to know where they end. Whatever is designed should be much more easily parseable. Should you say "everything from ESC[ to m", you'll cause a whole bunch of problems when a different kind of escape sequence gets interpreted as Unicode. A parser, by the way, would also have to interpret combined sequences like ESC[3;0;1m or alike, for which I don't see a good reason as opposed to having separate sequences for each. Also, it should be carefully evaluated what to do with C1 (U+009B) instead of the C0 ESC[ opening for an escape sequence – here terminal emulators vary. These just make everything even more cumbersome. ECMA-48 8.3.117 specifies ESC[1m as "bold or increased intensity". It's only nowadays that most terminal emulators support 256 colors and some even support 16M true colors that some emulators try to push for this bit unambiguously meaning "bold" only, whereas in most emulators it means "both bold and increased intensity". Because of compatibility reason, it won't be a smooth switch. Note that "bold" and "increased intensity" only go in the same direction with white-on-black color scheme, with black-on-white bold stands out more while increased intensity (a lighter shade of gray instead of black) stands out less. (We could also start nitpicking that the spec doesn't even say that increased intensity is just for the foreground and not for the background too.) Should this scheme be extended for colors, too? What to do with the legacy 8/16 as well as the 256-color extensions wrt. the color palette? Should Unicode go into the business of defining a fixed set of colors, or allow to alter the palette colors using the OSC 4 and friends escape sequences which supported by about half of the terminal emulators out there? For 256-colors and truecolors, there are two or three syntaxes out there regarding whether the separator is a colon or a semicolon. ECMA-48 doesn't say anything about it, TUI T.416 does, although it's absolutely not clear. See e.g. the discussion at the comment section of https://gist.github.com/XVilka/8346728 , in Dec 2018, we just couldn't figure out which syntax exactly TUI T.416 wants to say. Moreover, due to a common misinterpretation of the spec, one of the positional parameters are often omitted. Some terminal emulators have made up some new SGR modes, e.g. ESC[4:3m for curly underline. What to do with them? Where to draw the line what to add to Unicode and what not to? Will Unicode possibly be a bottleneck of further improvements in terminal emulators, because from now on every new mode we figure out we'd like to have in terminals should go through some Unicode committee? And what if Unicode wants to have a mode that terminal emulators aren't interested in, who will assign numbers to them that don't clash with terminals? Who will somehow keep the two worlds in sync? What to do with things that Unicode might also want to have, but doesn't exist in terminal emulators due to their nature, such as switching to a different font size? > This mechanism [...] is already supported > as widely as any new Unicode-only convention will ever be. I truly doubt this, these escape sequences are specific to terminal emulation, an extremely narrow subset of where Unicode is used and rich text might be desired. I see it a much more viable approach if Unicode goes for something brand new, something clean, easily parseable, and it remains the job of specific applications to serve as a bridge between the two worlds. Or, if it wants to adopt some already existing technology, I find HTML/CSS a much better starting point. regards, egmont On Fri, Feb 8, 2019 at 9:55 PM Doug Ewell via Unicode <unicode@unicode.org> wrote: > > I'd like to propose encoding italics and similar display attributes in > plain text using the following stateful mechanism: > > • Italics on: ESC [3m > • Italics off: ESC [23m > • Bold on: ESC [1m > • Bold off: ESC [22m > • Underline on: ESC [4m > • Underline off: ESC [24m > • Strikethrough on: ESC [9m > • Strikethrough off: ESC [29m > • Reverse on: ESC [7m > • Reverse off: ESC [27m > • Reset all attributes: ESC [m > > where ESC is U+001B. > > This mechanism has existed for around 40 years and is already supported > as widely as any new Unicode-only convention will ever be. > > -- > Doug Ewell | Thornton, CO, US | ewellic.org > >