Hi, Anton Lindqvist wrote on Sun, Jan 22, 2017 at 02:57:12PM +0100:
> I recently encountered a bug related to UTF-8 in ksh(1). > > While inserting the following sequence, part of my prompt gets mangled: > > a<backward-char>ö > > With PS1='ksh$ ' I expect the following output: > > ksh$ öa > > ... actual output: > > kshöaa I cannot reproduce. It works for me on OpenBSD-current (amd64). Which version of OpenBSD are you using? > Examining the output buffer when the 'ö' character is inserted > shows the following, piped through hexdump: > > 00000000 c3 61 08 |.a.| > 00000003 > > 0xc3 is the first byte of the 'ö' character and the trailing > backspace (0x08) causes the cursor to move past the incomplete UTF-8 > sequence. I don't understand what you are talking about here. In particular, what is that "output buffer" you are talking about? > The backspace is emitted by the following lines in function x_ins: > > $ sed -n 460,464p /usr/src/bin/ksh/emacs.c > if (adj == x_adj_done) { > /* no */ > for (cp = xlp; cp > xcp; ) > x_bs(*--cp); > } > > A solution would be to only emit a backspace if cp[-1] is a UTF-8 > continuation byte and cp[-2] a UTF-8 start byte. This removes one of > erroneous backspaces that eats the prompt. > > Examining the output buffer when the last byte (0xb6) of 'ö' is > inserted: > > 00000000 08 c3 b6 61 08 |...a.| > > The leading erroneous backspace is caused by the following lines in > function x_zots, introduced in r1.64: > > $ sed -n 687,691p bin/ksh/emacs.c > if (str > xbuf && isu8cont(*str)) { > while (str > xbuf && isu8cont(*str)) > str--; > x_e_putc('\b'); > } > > I haven't found any viable solution to not emit the backspace if a > character is prepended, as opposed of appended. > > Any ideas on how to solve this issue would be much appreciated. I neither understand the problem nor any part of your analysis. Sorry, Ingo