On Mon, May 17, 2021 at 09:22:35AM +0200, Uwe Waldmann wrote: > > According to https://unicodeplus.com/U+A7BA > > > > The character Ꞻ (Latin Capital Letter Glottal A) is represented by the > > Unicode codepoint U+A7BA. It is encoded in the Latin Extended-D block, > > which belongs to the Basic Multilingual Plane. It was added to Unicode > > in version 12.0 (March, 2019). It is HTML encoded as Ꞻ. > > > > xterm #344 is a little earlier than that. Its fallback copy of wcwidth > > doesn't list that range (I updated the table to Unicode 12 in #345, > > and added a test-driver around that time). > > > > The system wcwidth doesn't cover that range either. > > > > Characters which aren't known to wcwidth are treated as nonprinting... > > > > (In Debian 9.1, it still worked correctly.) > > > > hmm - which version of xterm was that? > > > > I'm guessing that it was #327 > > yes. > > > (it should not have worked, but there's always the possibility that I > > fixed a bug which was making it appear to work) > > OK, that's possible. Thanks for the explanation.
In #327, xterm's wcwidth checked if the codes were combining characters (using a table), or control characters and (for example this case) matched it against some ranges of double-width characters. If it was none of those, it assumed single-width. Starting in #330, I added another table "unknowns" to account for codes which had no specific width: Patch #330 - 2017/06/20 * modify wcwidth.c to return -1 for non-Unicode values, and adjust a couple of blocks to better match assumptions about ambiguous-width characters in other implementations. Also modify wcwidth.c to support configurable soft-hyphen, so there is no drawback to using this version rather than a system wcwidth. -- Thomas E. Dickey <dic...@invisible-island.net> https://invisible-island.net ftp://ftp.invisible-island.net
signature.asc
Description: PGP signature