Re: Fix broken UTF-8 decoding

Miod Vallat Mon, 06 Mar 2023 09:16:23 -0800

> Currently it is not possible to use unicode codepoints > 0xFF on the console,
> because our UTF-8 decoding logic is badly broken.
> 
> The code in question is in wsemul_subr.c, wsemul_getchar().
> 
> The problem is that we calculate the number of bytes in a multi-byte
> sequence by just looking at the high bits in turn:
> 
>                       if (frag & 0x20) {
>                               frag &= ~0x20;
>                               mbleft++;
>                       }
>                       if (frag & 0x10) {
>                               frag &= ~0x10;
>                               mbleft++;
>                       }
>                       if (frag & 0x08) {
>                               frag &= ~0x08;
>                               mbleft++;
>                       }
>                       if (frag & 0x04) {
>                               frag &= ~0x04;
>                               mbleft++;
>                       }
> 
> This is wrong, for several reasons.


Doh! Thanks for noticing this. I have replaced that code with something
much saner now.

Miod

Re: Fix broken UTF-8 decoding

Reply via email to