The bug is in O.P.'s code as &s is not being passed to mbrtowc. I'm on Ubuntu. I do not have Cygwin here.
I should consume some calories before trying to debug anything. On Tue, Jul 28, 2009 at 6:14 AM, Corinna Vinschen<corinna-cyg...@cygwin.com> wrote: > On Jul 27 22:56, Andy Koppe wrote: >> I've encountered what looks like a bug in mbrtowc's handling of UTF-8. >> Here's an example: >> >> #include <stdio.h> >> #include <locale.h> >> #include <stdlib.h> >> #include <wchar.h> >> >> int main(void) { >> wchar_t wc; >> size_t ret; >> mbstate_t s = { 0 }; >> puts(setlocale(LC_CTYPE, "en_GB.UTF-8")); >> printf("%i\n", mbrtowc(&wc, "\xe2", 1, 0)); >> printf("%i\n", mbrtowc(&wc, "\x94", 1, 0)); >> printf("%i\n", mbrtowc(&wc, "\x84", 1, 0)); >> printf("%x\n", wc); >> return 0; >> } >> >> The sequence E2 94 84 should translate to U+2514. Instead, the second >> and third calls to mbrtowc report encoding errors. It does work >> correctly if the three bytes are passed to mbrtowc() in one go: >> >> printf("%i\n", mbrtowc(&wc, "\xe2\x94\x84", 3, 0)); > > That's a bug in the newlib function __utf8_mbtowc. I'm really surprised > that this bug has never been reported before since it's in the code for > years, probably since it has been introduced in 2002. > > I'll follow up on the newlib list. > > > Thanks for the report and especially thanks for the testcase, > Corinna -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple