On Thu, Mar 14, 2019 at 04:04:20PM +0100, Ingo Schwarze wrote: > Hi, > > the following is a very simple patch to completely clean up the > file less/search.c with respect to UTF-8 handling. It also fixes > an outright bug: Searching for uppercase UTF-8 characters currently > doesn't work because passing a Unicode codepoint (in this case, the > "ch" retrieved with step_char()) to isupper(3) is just totally > wrong. > > The new loop is fairly standard. Invalid bytes are simply skipped. > > OK? > Ingo >
Yes, OK. > Index: search.c > =================================================================== > RCS file: /cvs/src/usr.bin/less/search.c,v > retrieving revision 1.19 > diff -u -p -r1.19 search.c > --- search.c 2 Aug 2017 19:35:57 -0000 1.19 > +++ search.c 14 Mar 2019 13:48:59 -0000 > @@ -75,12 +75,14 @@ static struct pattern_info filter_info; > static int > is_ucase(char *str) > { > - char *str_end = str + strlen(str); > - LWCHAR ch; > + wchar_t ch; > + int len; > > - while (str < str_end) { > - ch = step_char(&str, +1, str_end); > - if (isupper(ch)) > + for (; *str != '\0"; str += len) { > + if ((len = mbtowc(&ch, str, MB_CUR_MAX)) == -1) { > + mbtowc(NULL, NULL, MB_CUR_MAX); > + len = 1; > + } else if (iswupper(ch)) > return (1); > } > return (0);