> Currently it is not possible to use unicode codepoints > 0xFF on the console, > because our UTF-8 decoding logic is badly broken. > > The code in question is in wsemul_subr.c, wsemul_getchar(). > > The problem is that we calculate the number of bytes in a multi-byte > sequence by just looking at the high bits in turn: > > if (frag & 0x20) { > frag &= ~0x20; > mbleft++; > } > if (frag & 0x10) { > frag &= ~0x10; > mbleft++; > } > if (frag & 0x08) { > frag &= ~0x08; > mbleft++; > } > if (frag & 0x04) { > frag &= ~0x04; > mbleft++; > } > > This is wrong, for several reasons.
Doh! Thanks for noticing this. I have replaced that code with something much saner now. Miod