On Sat, Apr 05, 2025 at 09:36:07AM +0200, Walter Alejandro Iglesias wrote: > Hi again Lucas, > > This time I paid a little more attention :-). Maybe I'm missing > something, but it seems to me that, in your patch, the skip_utf8_cont > variable is unnecessary. > > Anyway, at first I'd also tried doing something similar to what you > suggested, I still think it doesn't require so much fuss. Let's see if > I don't make any stupid mistakes with this new version of mine:
anton pointed the same out in a private email and also requested regress modifications. The current patch is diff refs/heads/master 563a90a52e59962e09d4d2c0897c06024dab84be commit - 58fd8d0bdc1e6222119987e7aaad111eae245668 commit + 563a90a52e59962e09d4d2c0897c06024dab84be blob - cdda9cb24b1a4a395547e081ff3adca380d3b6c1 blob + 0b6459cb5f31c2a8e00de8fb837b4e7039d59214 --- bin/ksh/vi.c +++ bin/ksh/vi.c @@ -1590,15 +1590,18 @@ backword(int argcnt) static int endword(int argcnt) { - int ncursor, skip_space, want_letnum; + int ncursor, skip_space, skip_utf8_cont, want_letnum; unsigned char uc; ncursor = es->cursor; while (ncursor < es->linelen && argcnt--) { - skip_space = 1; + skip_space = skip_utf8_cont = 1; want_letnum = -1; while (++ncursor < es->linelen) { uc = es->cbuf[ncursor]; + if (skip_utf8_cont && isu8cont(uc)) + continue; + skip_utf8_cont = 0; if (isspace(uc)) { if (skip_space) continue; @@ -1663,6 +1666,9 @@ Endword(int argcnt) ncursor = es->cursor; while (ncursor < es->linelen && argcnt--) { while (++ncursor < es->linelen && + isu8cont((unsigned char)es->cbuf[ncursor])) + ; + while (++ncursor < es->linelen && isspace((unsigned char)es->cbuf[ncursor])) ; while (ncursor < es->linelen && blob - 2c33d0005da16ffd525336ada48374de632235a9 blob + 348511d26252c3d5f1afe9b9cfecdf8da2bcc272 --- regress/bin/ksh/edit/vi.sh +++ regress/bin/ksh/edit/vi.sh @@ -87,6 +87,15 @@ testseq "1.00 two\00330ED" " # 1.00 two\b\r # 1.0 # e: Move to end of word. testseq "onex two\00330eD" " # onex two\b\r # one \b\b\b\b\b\b" +# No infinite loop moving to end of {,big} word for non-ASCII UTF-8-ending +# words. +# EURO SIGN U+20AC is encoded as bytes 0xe2 0x82 0xac = \0342\0202\0254 +euro='\0342\0202\0254' +testseq "1.00$euro 2.00 three\00330EED" \ + " # 1.00$euro 2.00 three\b\r # 1.00$euro 2.0 \b\b\b\b\b\b\b\b" +testseq "one$euro twox three\00330eeD" \ + " # one$euro twox three\b\r # one$euro two \b\b\b\b\b\b\b\b" + # F: Find character backward. # ;: Repeat last search. # ,: Repeat last search in opposite direction.