On 11/07/2019, Ingo Schwarze <schwa...@usta.de> wrote: > Hi Ian, Hi Ingo,
I've just noticed yet another false positive where Gmail has classified your email as spam here for the n-th time. I'm not sure if that's just happening to my mailbox, or if it's Gmail-wide or, worse, if lots of MTAs out there treat your emails as spam. (There seems to be a trend where big corps are quite happy to discourage people from running their own MTAs and increasingly throw their weight around rejecting anything that isn't credentialled up the wazoo with SPF, DKIM, DMARC or whatever, and of course there are powerful interests who want to deanonymise the Net, which may be related.) Either way, maybe this is something you'll want to look into from your end? > ropers wrote on Thu, Jul 11, 2019 at 12:41:45AM +0200: > >> While I'm personally only or mainly interested in Alt+numeric input, >> if altnumd existed, it would probably be comparatively easy to then >> extend it and add support for Alt+u0000 thru Alt+u10ffff, with the U >> becoming a reserved keyword unambiguously signifying that what follows >> will be a Unicode code point between U+0000 and U+10FFFF. > > There is no reason to make it different. ASCII is a subset of Unicode, > with the same numbering. So the "U" looks redundant to me. There are several reasons why it isn't redundant: 1. Alt codes are decimal, but Unicode code points are hexadecimal. Alt+73, Alt+78, Alt+71, Alt+79 "INGO" would become "sxqy" if treated as hex. 2. Unicode code points (format: U+xxxx, mostly, though it goes up to U+10FFFF) are NOT character bytes. I quoted Wikipedia on this in my email two days ago: >> [4] <https://en.wikipedia.org/wiki/Character_encoding#Terminology>: >> "The compromise solution that was eventually found and developed into >> Unicode was to break the assumption (dating back to telegraph codes) >> that each character should always directly correspond to a particular >> sequence of bits. Instead, characters would first be mapped to a >> universal intermediate representation in the form of abstract numbers >> called __code points__. Code points would then be represented in a >> variety of ways and with various default numbers of bits per character >> (__code units__) depending on context. To encode code points higher >> than the length of the code unit, such as above 256 for 8-bit units, >> the solution was to implement variable-width encodings where an escape >> sequence would signal that subsequent bits should be parsed as a >> higher code point." You are correct that in the case of the variable-length UTF-8 and for the 128 (non-extended/7-bit) ASCII characters only, this isn't a problem, because with them, code points (U+xxxx) and code units (bytes) actually ARE still substantially identical. That saving grace pretty much does not exist with other, non-UTF-8 Unicode encodings. Okay, maybe it still does if you drop all the leading zeroes over multiple bytes. However: 3. I would be wary of dropping leading zeroes in the case of Unicode code point support. With Alt codes, the precedent of optionally allowing the dropping of leading zeroes has been set, but pretty much all Unicode documentation I'm aware of consistently prints code points in the U+xxxx format (or longer, up to U+10FFFF where applicable). There's a good argument for supporting code point entry exactly as written, and nobody writes U+0 through U+FFF. If you install the gucharmap package <http://ports.su/x11/gnome/gucharmap>, it has a Character Details tab where you can not only see how much UTF-8 code units (bytes) can differ from code points (U+xxxx), but you can also see that even for those low code points where both match, the U+xxxx is still not printed with leading zeroes omitted. 4. One also should be as restrained and conservative as is practical in terms of "claiming" key combinations, especially claiming them system-wide. Yes, users could set up some hotkey somewhere that kills and relaunches altnumd (I'm not even sure if that belongs in altnumd itself), but you don't want to do that all the time just to type a key combo that collides with altnumd. "Hold down Alt while typing U, <x>,<x>,<x>,<x>, then release Alt" is quite specific, and could reduce cases where a sequence in .altnumrc collides with something else. "Hold down Alt while typing up to three digits on the number pad, then release Alt" is also relatively specific, though perhaps one might accept non-numpad digit entry too, or make that choice configurable. Alt+<n> single digit is more likely to collide with something, though the long-standing precedent of Alt codes existing at least on some platforms may make that less likely. 5. Perhaps there could be an opportunity for simplifying and unifying Alt+U<codept> and existing iffy Ctrl+Shift U+xxxx support? OTOH, maybe it's better to deliberately not collide with that other method and maybe that's a good reason for a universal Unicode code point method to reside at its own key combo. Remember, my actual goal is Alt code support, not Alt+U<codept> support. The opportunity to tack on U<codept> support once Alt+<numpad> support exists was more of an outgrowth showing that at least for now, our desires seem to point in the same direction. I'll pause here, because this has gotten long. "Further bulletins as events warrant." Or when I get around to it, rather. All the best now, Ian >> There's a huge competence gap between us, > > Quite likely. I'm so clueless that right now, i can't even seem to get > Compose to work even though i'm sure i had it working in the past. > This is on amd64-current, inside xterm(1) and ksh(1): > > $ locale > LANG= > LC_COLLATE="en_US.UTF-8" > LC_CTYPE="en_US.UTF-8" > LC_MONETARY="en_US.UTF-8" > LC_NUMERIC="en_US.UTF-8" > LC_TIME="en_US.UTF-8" > LC_MESSAGES="en_US.UTF-8" > LC_ALL=en_US.UTF-8 > $ setxkbmap -query -v -v -v > Setting verbose level to 8 > locale is en_US.UTF-8 > Trying to load rules file ./rules/base... > Trying to load rules file /usr/X11R6/share/X11/xkb/rules/base... > Success. > Applied rules from base: > rules: base > model: pc105 > layout: de > Trying to build keymap using the following components: > keycodes: xfree86+aliases(qwertz) > types: complete > compat: complete > symbols: pc+de+inet(pc105)+terminate(ctrl_alt_bksp) > geometry: pc(pc105) > rules: base > model: pc105 > layout: de > > At this point, the caps key toggles caps lock, i.e. pressing > > caps a caps a > > results in the input "Aa". > > $ setxkbmap -option compose:caps -v -v -v > Setting verbose level to 8 > locale is en_US.UTF-8 > Trying to load rules file ./rules/base... > Trying to load rules file /usr/X11R6/share/X11/xkb/rules/base... > Success. > Applied rules from base: > rules: base > model: pc105 > layout: de > options: compose:caps > Trying to build keymap using the following components: > keycodes: xfree86+aliases(qwertz) > types: complete > compat: complete > symbols: pc+de+inet(pc105)+terminate(ctrl_alt_bksp)+compose(caps) > geometry: pc(pc105) > > Now, the caps key no longer toggles caps lock and becomes a dead key, > i.e. pressing > > caps , c caps " a > > results in the input "ca". However, the resulting input is really > ASCII-c ASCII-a rather than the expected c-cedille a-umlaut. > It looks like Compose works well enough to discard the , and ", > but not well enough to actually generate non-ASCII characters. > > Somewhat grumpy today, > Ingo >