On 07/22/2013 02:12 AM, Corinna Vinschen wrote: >>> However, please note that this behaviour, while being provided by glibc >>> and now by Cygwin, is *not* standards-compliant. In the narrow sense >>> the characters beyond 0x7f are still invalid ASCII chars, and other >>> functions working with wchar_t strings won't be as forgiving when using >>> invalid input. >>>
> After some sleep, I think I now understand why the glibc devs made > regcomp to work this way. This behaviour is backward compatible to non > locale-aware applications. In the "C" locale, a char is just some > arbitrary byte between 0 and 255. So this pattern always worked before > in the "C locale, therefore it makes sense that it continues to work, > even if it won't when using other locales/codesets. By the way, there is currently a big debate going on in the Austin Group (the people responsible for POSIX) on whether the "C" locale must be 8-bit clean (the way glibc behaves) or whether it was intended to allow UTF-8 encoding by default (the way musl libc wants to behave); and resolution of the debate will require input from the C standards committee. There may be some interesting fallout, no matter which solution is finally reached. http://austingroupbugs.net/view.php?id=663 -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature