On Sun, Aug 24, 2008 at 12:29:06PM +0200, Bruno Haible wrote: > > + dnl On Solaris 8, wcwidth(0x2022) (BULLET) returns -1. > > This is not the case for me:
I'm sorry. In my case it also gives 2, not -1. (I forgot to call setlocale in the new test program, oops). New patch attached. > Which looks all fine. (Giving the BULLET a width of 2 is a bit strange, but > not really wrong.) Well, it does not seem to match current xterm behavior, and thus leads to strange visual results. I don't know, maybe it is an xterm problem, but the easiest way was to substitute wcwidth. Earlier I used my own autoconf tests and mk_wcwidth replacement, but recently have decided to move to gnulib (with gnulib-tool). > > The slowness is probably caused by checking the charset string every time > > wcwidth is called. I'm not sure which way to fix it would be correct, > > probably > > caching the check result will help. > > When would the cache be invalidated? You cannot hook into setlocale(). Unfortunately. > > BTW, why not use this one: http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c ? > > It's public domain. > > It has also its bugs [1]. Additionally, it's slower because it uses binary > search rather than immediate table accesses. Let's measure it. $ time ./wcwidth-solaris wcwidth(0x2022)=2 real 0m2.205s user 0m2.200s sys 0m0.000s $ time ./wcwidth-rpl wcwidth(0x2022)=1 real 0m55.477s user 0m55.350s sys 0m0.000s $ time ./wcwidth-mk wcwidth(0x2022)=1 real 0m1.944s user 0m1.940s sys 0m0.010s So despite the binary search the mk version it the fastest. The test program: #include <locale.h> #include <stdio.h> int main() { int i,j; setlocale(LC_ALL,"en_US.UTF-8"); printf("wcwidth(0x2022)=%d\n",wcwidth(0x2022)); for(j=0; j<300; j++) for(i=0; i<0x10000; i++) wcwidth(i); return 0; } -- Alexander.
diff --git a/m4/wcwidth.m4 b/m4/wcwidth.m4 index 04a9fc2..7793002 100644 --- a/m4/wcwidth.m4 +++ b/m4/wcwidth.m4 @@ -38,6 +38,7 @@ AC_DEFUN([gl_FUNC_WCWIDTH], else dnl On MacOS X 10.3, wcwidth(0x0301) (COMBINING ACUTE ACCENT) returns 1. dnl On OSF/1 5.1, wcwidth(0x200B) (ZERO WIDTH SPACE) returns 1. + dnl On Solaris 8, wcwidth(0x2022) (BULLET) returns 2. dnl This leads to bugs in 'ls' (coreutils). AC_CACHE_CHECK([whether wcwidth works reasonably in UTF-8 locales], [gl_cv_func_wcwidth_works], @@ -64,7 +65,7 @@ int wcwidth (int); int main () { if (setlocale (LC_ALL, "fr_FR.UTF-8") != NULL) - if (wcwidth (0x0301) > 0 || wcwidth (0x200B) > 0) + if (wcwidth (0x0301) > 0 || wcwidth (0x200B) > 0 || wcwidth(0x2022) != 1) return 1; return 0; }], [gl_cv_func_wcwidth_works=yes], [gl_cv_func_wcwidth_works=no],