On Thu, Mar 14, 2019 at 04:04:20PM +0100, Ingo Schwarze wrote:
> Hi,
> 
> the following is a very simple patch to completely clean up the
> file less/search.c with respect to UTF-8 handling.  It also fixes
> an outright bug: Searching for uppercase UTF-8 characters currently
> doesn't work because passing a Unicode codepoint (in this case, the
> "ch" retrieved with step_char()) to isupper(3) is just totally
> wrong.
> 
> The new loop is fairly standard.  Invalid bytes are simply skipped.
> 
> OK?
>   Ingo
> 

Yes, OK.

> Index: search.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/less/search.c,v
> retrieving revision 1.19
> diff -u -p -r1.19 search.c
> --- search.c  2 Aug 2017 19:35:57 -0000       1.19
> +++ search.c  14 Mar 2019 13:48:59 -0000
> @@ -75,12 +75,14 @@ static struct pattern_info filter_info;
>  static int
>  is_ucase(char *str)
>  {
> -     char *str_end = str + strlen(str);
> -     LWCHAR ch;
> +     wchar_t ch;
> +     int len;
>  
> -     while (str < str_end) {
> -             ch = step_char(&str, +1, str_end);
> -             if (isupper(ch))
> +     for (; *str != '\0"; str += len) {
> +             if ((len = mbtowc(&ch, str, MB_CUR_MAX)) == -1) {
> +                     mbtowc(NULL, NULL, MB_CUR_MAX);
> +                     len = 1;
> +             } else if (iswupper(ch))
>                       return (1);
>       }
>       return (0);

Reply via email to