Forwarded to newlib. ----- Forwarded message from Eric Blake ----- > Date: Tue, 12 May 2009 16:02:04 +0000 (UTC) > From: Eric Blake > Subject: [1.7] wcwidth failing configure tests > To: cygwin AT cygwin DOT com > > I noticed this failure in various configure scripts (findutils, coreutils, > ...): > > checking whether wcwidth works reasonably in UTF-8 locales... no > > I've reduced it to a STC: > > #include <locale.h> > #include <wchar.h> > int main () > { > int i = 0; > if (setlocale (LC_ALL, "fr_FR.UTF-8") != NULL) > { > if (wcwidth (0x0301) > 0) > i |= 1; > if (wcwidth (0x200B) > 0) > i |= 2; > } > return i; > } > > The return value should be 0 but is coming back as 3; 0x0301 is a combining > mark which should occupy no space on its own, and 0x200b is a 0-width space, > according to Unicode 5.1 (and earlier, to some extent). And that probably > means that other places within wcwidth() are broken. ----- End forwarded message -----
wcwidth returns 1 if iswprint returns true. I had a quick debug attempt and it turns out that the entire range 0x0300..0x034f is marked as printable in the u3 array in libc/ctype/utf8print.h. The entire range 0x0300..0x034f are combining characters which are printable, but have zero width. 200b..200d are all three zero-width characters but all three are also printable. Scanning the Unicode 5.1 standard, I see a couple of these characters, which are printable but have zero width: 0300..036f 0483..0489 200b..200f 20d0..20ea 3099..309a fe20..fe23 (not sure about them. Each of them is the half of a full combined char which doesn't make sense alone, afaics) feff and a couple of musical symbols in the 0x1d1xx range How can we fix this problem? Should we hardcode a check for the above character values in wcwidth? And here's another question. The utf8*.h files claim they have been generated from the unicode.txt file of the Unicode 3.2 standard. Do we have the script which generated the utf8*.h files? Can we regenerate the files to match the current Unicode 5.1 standard? Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/