On Sun, Jun 20, 2004 at 08:25:06AM +0200, Christian Perrier wrote: > Quoting Eugeniy Meshcheryakov ([EMAIL PROTECTED]): > > Christian Perrier wrote: > > > > > >Currently unusable in cdebconf (seems to be a whiptail bug in Unicode > > >environments). Just try to enter any non ASCII character in a dialog > > >box..:-( > > > > > >#251550 > > > > > I can enter cyrillic characters (that are not ASCII) used in Ukrainian > > in d-i. This looks more like problem with keymap files. > > Hmmmm, so nothing to do with whiptail, then? I'm puzzled. > > Let's ask Alastair, he will maybe have some ideas.... > > Alastair, could you have a look at #251550? > > Basically, you just enter a non ASCII character in a dialog during > Debian Installer 1st stage (when installing in French, German...and > probably even English), for instance in the dialog asking for a host > name or IP address. > > Then the display seems frozen : typing anything just does nothing. You > have to hit Ctrl-A for having it working again. > > This is a serious problem because any input of such high ASCII > character will "freeze" the installer, from the user point of view.
This looks somewhat similar to 243373, output was truncated when illegal UTF-8 sequences were printed. Here input is broken when keyboard sends illegal UTF-8 sequences. Of course keyboard should send valid UTF-8 sequences, so one cannot blame whiptail too much. According to kbd_mode(1) Linux console keyboard driver has 4 modes, two of them are of interest for us, namely ASCII and UTF-8 modes. Internally Linux kernel uses Unicode; in UTF-8 mode, there is no conversion, characters are passed to the kernel (there seems to be a UTF-16 -> UTF-8 conversion, but it can be ignored). In ASCII mode, characters are converted to Unicode by using the charset found when loadkeys was invoked. On the other hand keymaps(5) explains how to write keymap files. Characters can be defined numerically (decimal or octal value), litterally (e.g. eacute) or with their Unicode codepoints (eg. U+00E9). When loadkeys parses keymap files, numerical and litteral values are converted to 0-255 values (according to a charset) whereas Unicode values are stored as complement to 0xf000. A value is then (roughly) decoded by: * if value >= 0x0c00, this is a Unicode character: value ^ 0xf000 * otherwise this character had a numerical or litteral notation, and its value in the current charset is the last significant byte. For reasons I do not understand, these 2 conversions (keyboard mode and input parsing) are mixed. Now consider this line from fr-latin1: keycode 3 = eacute two dead_tilde $ kbd_mode -a $ export LANG=fr_FR $ dumpkeys -n ... keycode 3 = 0x00e9 0x0032 0x0403 0x0000 ... $ export LANG=fr_FR.UTF-8 $ unicode_start $ dumpkeys -n ... keycode 3 = 0x00e9 0x0032 0x0403 0x0000 ... But now keyboard is in UTF-8 mode, so bytes are passed to the kernel without conversion, and 0x00 0xe9 is sent instead of its UTF-8 representation 0xc3 0xa9. If eacute is replaced by its Unicode notation U+00E9 in keymap file, everything works fine now. But this keymap cannot be used when keyboard is in ASCII mode, The only solution is to have two keymaps, one for ASCII mode and the other one for UTF-8 mode. This looks pretty crazy, loadkeys should automatically convert from numerical/litteral value to Unicode notation (and vice versa) depending on current keyboard mode. Denis