Thank you for reviewing a number of patches related to `bind'. This is the final report related to `bind' that I currently have.
Configuration Information [Automatically generated, do not change]: Machine: x86_64 OS: linux-gnu Compiler: gcc Compilation CFLAGS: -g -O2 -Wno-parentheses -Wno-format-security uname output: Linux hp2019 5.2.13-200.fc30.x86_64 #1 SMP Fri Sep 6 14:30:40 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux Machine Type: x86_64-pc-linux-gnu Bash Version: 5.0 Patch Level: 11 Release Status: maint Description: The output of `bind -psX' for the bindings with the keyseq contaning <0x5c> (C-\) is not reusable for bind/inputrc. There are also other problems. This is related to two inconsistent representations of <0x5c>. In several places in the current Bash codes, one can find two different assumptions on the reprensentation of <0x5c> in untranslated keyseqs, `\C-\' and `\C-\\': * rl_parse_and_execute: When the bind command extracts KEYSEQ from the argument of the form '"KEYSEQ":...', it skips the combination `\ + (any character)'. This implies that the form `\C-\' is not compatible with `rl_parse_and_execute' because the keyseq cannot be properly extracted from, e.g., '"\C-\":...'. So this function prefers the representation `\C-\\'. * rl_translate_keyseq: However, the current implementation of `rl_translate_keyseq' interprets `\C-\' as <0x5c>. If the input `\C-\\` is given, it will be translated to two bytes `<0x5c> + <\>' (from Bash 5.0). * rl_function_dumper: `bind -p' outputs `"\C-\\"' for the keyseq "\x5c". However, if the keyseq contains more than one byte, `bind -p' prints `\C-\\' only for the last <0x5c> and `\C-\' for the other <0x5c>. Also if the binding is a shadow binding, `bind -p' prints `\C-\' for all the <0x5c>. * rl_macro_dumper: While, `bind -sX' consistently prints `\C-\\' for all the cases of <0x5c>. This inconsistency has actually existed from Bash 3.0, but it caused problems in very limited cases because in Bash 4.4 and before, trailing backslashes in untranslated keyseq was just ignored in `rl_translate_keyseq'. Because of this behavior, the untranslated keyseq '\C-\\' is interpreted as <C-\> (+ <ignored \>) by `rl_translate_keyseq', i.e, it apparently behaved as if it supports the representation `\C-\\'. Also, `bind -p' only outputs `\C-\' for shadow bindings or multibyte keyseqs. Nevertheless there are still cases where multibyte keyseqs cause unexpected results if one assumes the representation `\C-\\': For example the keyseq '\C-\\a' is translated to <C-\> + <C-g (\a)> rather than the expected one <C-\> + <a>. The situation has changed after a fix in Bash 5.0 where the trailing backslashes become to be preserved by `rl_translate_keyseq'. The fix was introduced in the commit af2a77fb (commit bash-20170505 snapshot). After this fix, `\C-\\' becomes to be translated to `<0x5c> + <\>' by `rl_translate_keyseq'. This breaks the codes which assumes the representation `\C-\\' and also the codes that reuse the output of `bind -psX'. It should be noted that the codes assuming `\C-\' neither works because `bind '"\C-\":...'' will not be parsed as expected since \" is treated as a group in `rl_parse_and_bind'. In fact, my Bash configuration were broken in Bash 5.0 since I thought `\C-\\' from the output of `bind -psX' is the correct way to write (later I switched to `\x5c' for a workaround). This is the reason why I noticed this inconsistency. Related to the handling of the escape sequences in `rl_translate_keyseq', there are also other problems: the interaction of \C-\M-x and \M-\C-x with the readline setting `convert-meta' is not properly implemented. Repeat-By: Details are described in the above "Description" section. If that description is enough, please skip this section. Example 1: `bind -psX' In the following example, `bind -psX' is called for a single byte case and a double byte case. The behavior is the same for all of the Bash versions from 3.0 to 5.0 and for the devel branch (except that `bind -X' is only supported from Bash 4.3). $ cat test1a.sh bind '"\x1c":self-insert' bind -p | grep '\\C-\\' bind '"\x1c":"hello"' bind -s bind -x '"\x1c":echo world' bind -X exit $ cat test1b.sh bind '"\x1c\x1c":self-insert' bind -p | grep '\\C-\\' bind '"\x1c\x1c":"hello"' bind -s bind -x '"\x1c\x1c":echo world' bind -X exit $ LANG=C bash --rcfile test1.sh "\C-\\": self-insert "\C-\\": "hello" "\C-\\": "echo world" <-- all of `bind -psX' prints \C-\\ $ LANG=C bash --rcfile test2.sh "\C-\\C-\\": self-insert <-- `bind -p' prints *\C-\* + \C-\\ "\C-\\\C-\\": "hello" "\C-\\\C-\\": "echo world" <-- `bind -sX' prints \C-\\ + \C-\\ Example 2: bind '"\C-\...":...' In the following, the behavior of `rl_translate_keyseq' is tested. The behavior is the same from Bash 3.0 to 4.4 but changed from Bash 5.0. $ cat test2.sh bind '"\C-\\":"hello"' bind '"\C-\a":"bash"' bind '"\C-\\a":"world"' bind -s exit $ LANG=C bash-4.4 --rcfile test2.sh "\C-\\\C-g": "world" <-- '\C-\\a' is treated as <0x5c><C-g> "\C-\\a": "bash" <-- `\C-\a' is treated as <0x5c><a> "\C-\\": "hello" <-- `\C-\\' is treated as <0x5c> $ LANG=C bash-5.0 --rcfile test2.sh "\C-\\\C-g": "world" "\C-\\\\": "hello" <-- The treatment of trailing backslash changed "\C-\\a": "bash" Example 3: bind '"\C-\":...' In the following, the behavior of `rl_parse_and_execute' is tested. The behavior is essentially the same from Bash 3.0 to 5.0 and devel branch (except that an error message is printed from Bash 4.4). $ cat test3.sh bind '"\C-\":"hello"' bind -s exit $ LANG=C bash-4.3 --rcfile test3.sh $ LANG=C bash-4.4 --rcfile test3.sh readline: "\C-\":"hello": no key sequence terminator There are still other problems of `rl_translate_keyseq' as follows: Example 4: bind '"\M-\C-t":...' When the readline variable `convert-meta' is set to `on', `\M-\C-t' is not properly handled in the current devel branch as commented in the source code: lib/readline/bind.c:547: /* XXX - doesn't yet handle \M-\C-n if convert-meta is on */ It works fine with release versions of Bash from 3.2 to 5.0 (actually Bash 3.1 and 3.2 seem to have bugs for this). $ cat test4.sh bind 'set convert-meta off' bind '"\M-\C-t":"hello"' bind 'set convert-meta on' bind '"\M-\C-t":"world"' bind -s exit $ LANG=C ./bash-5.0 --rcfile test4.sh "\e\C-t": "world" "\224": "hello" $ LANG=C ./bash-3a7c642e --rcfile test4.sh "\e\\C-t": "world" <-- `\M-\C-t' becomes 5 keys <ESC><\><C><-><t> "\224": "hello" Example 5: bind '"\C-\M-t":...' Meta of `\C-\M-t' is always converted to ESC regardless of the setting `convert-meta'. This is reproduced in all of the Bash from 3.0 to 5.0 and the current devel branch. $ cat test4.sh bind 'set convert-meta off' bind '"\C-\M-t":"hello"' bind 'set convert-meta on' bind '"\C-\M-t":"world"' bind -s exit $ LANG=C ./bash-5.0 --rcfile test5.sh "\e\C-t": "world" $ LANG=C ./bash-3a7c642e --rcfile test5.sh "\e\C-t": "world" <-- "hello" is also bound to \e\C-t so it is overwritten Fix: If one fixes this inconsistensy, there are two options. One is to normalize them to `\C-\' and the other is to `\C-\\'. I think it should be normalized to `\C-\\' for the following reasons: * Because `\C-\M-x' and `\M-\C-x' are valid untranslated sequences representing one key, it is natural and consistent to accept another backslash escape sequence after `\C-' or `\M-' rather than just accept one backslash character. In addition if one adopted the option `\C-\', <C-M-x> (1 byte keyseq) and <C-\ M - x> (4 byte keyseq) would be in principle indistinguishable as both are represented as `\C-\M-x'. * Bash 4.4 and before behaved as if it supports the representation `\C-\\' for the simple cases, so it will not break much existing codes if we choose the option `\C-\\'. If the option `\C-\' is used it can break existing codes written for Bash 4.4 and before. * If one normalize them to `\C-\', the code to extract KEYSEQ from '"KEYSEQ":...' (rl_parse_and_bind) becomes complicated. The multibyte sequences \C-\M-\, \M-\C-\, \C-\, \M-\ have to be explicitly checked to avoid that the closing double quotes are mistakenly quoted by the ending backslashes of these multibyte sequences. I attach a patch for the option to normalize the representation to `\C-\\'. In the patch, functions `rl_translate_keyseq' and `rl_function_dumper' are modified. Modifying the functions, I noticed that `rl_translate_keyseq' have still other problems (see Example 4 and 5 in `Repeat-By' section above), so I decided to clean up the loop structure of `rl_translate_keyseq' to fix all the problems. The following are the additional consequences of the change: * \M-\a is now treated as a single key <M-C-g>, which is related to the following comment in the original `rl_translate_keyseq' /* This doesn't yet handle things like \M-\a, which may or may not have any reasonable meaning. You're probably better off using straight octal or hex. */ * Duplicate prefixes such as \C-\C-\M-\M-t are allowed and treated as if each is specified once, i.e., \C-\C-t is <C-t>, \M-\M-t is <M-t>, \C-\M-\M-\C is <C-M-t>. Even if this behavior is not preferable, it is easy to add codes checking duplicate prefixes. Thank you, Koichi
0001-lib-readline-bind-fix-treatment-of-escape-sequences.patch
Description: Binary data