ksh: fix input handling for 4 byte UTF-8 sequences

Sören Tempel Sun, 04 Apr 2021 12:39:29 -0700

Hello,

Currently, ksh does not correctly calculate the length of 4 byte UTF-8
sequences in emacs input mode. For demonstration purposes try inputting
an emoji (e.g. U+1F421) at your shell prompt. These 4 byte sequences can
be identified by checking if the first four bits are set and the fifth
bit isn't. The current check for identifying these 4 byte sequences is
incorrect.


The patch below fixes this, thereby allowing users to enter emojis
(and other 4 byte UTF-8 sequences) at their shell prompt in emacs mode:

Greetings,
Sören

diff --git bin/ksh/emacs.c bin/ksh/emacs.c
index 694c402ff..970a0989d 100644
--- bin/ksh/emacs.c
+++ bin/ksh/emacs.c
@@ -1851,7 +1851,7 @@ x_e_getu8(char *buf, int off)
                return -1;
        buf[off++] = c;
 
-       if (c == 0xf4)
+       if ((c & 0xf8) == 0xf0)
                len = 4;
        else if ((c & 0xf0) == 0xe0)
                len = 3;

ksh: fix input handling for 4 byte UTF-8 sequences

Reply via email to