On Sun, Aug 24, 2008 at 12:29:06PM +0200, Bruno Haible wrote:
> > +    dnl On Solaris 8, wcwidth(0x2022) (BULLET) returns -1.
> 
> This is not the case for me:

I'm sorry. In my case it also gives 2, not -1. (I forgot to call setlocale
in the new test program, oops). New patch attached.

> Which looks all fine. (Giving the BULLET a width of 2 is a bit strange, but
> not really wrong.)

Well, it does not seem to match current xterm behavior, and thus leads to
strange visual results. I don't know, maybe it is an xterm problem, but the
easiest way was to substitute wcwidth.

Earlier I used my own autoconf tests and mk_wcwidth replacement, but recently
have decided to move to gnulib (with gnulib-tool).

> > The slowness is probably caused by checking the charset string every time
> > wcwidth is called. I'm not sure which way to fix it would be correct, 
> > probably
> > caching the check result will help.
> 
> When would the cache be invalidated? You cannot hook into setlocale().

Unfortunately.

> > BTW, why not use this one: http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c ?
> > It's public domain.
> 
> It has also its bugs [1]. Additionally, it's slower because it uses binary
> search rather than immediate table accesses.

Let's measure it.

$ time ./wcwidth-solaris 
wcwidth(0x2022)=2

real    0m2.205s
user    0m2.200s
sys     0m0.000s

$ time ./wcwidth-rpl 
wcwidth(0x2022)=1

real    0m55.477s
user    0m55.350s
sys     0m0.000s

$ time ./wcwidth-mk 
wcwidth(0x2022)=1

real    0m1.944s
user    0m1.940s
sys     0m0.010s

So despite the binary search the mk version it the fastest. The test program:
#include <locale.h>
#include <stdio.h>
int main()
{
   int i,j;
   setlocale(LC_ALL,"en_US.UTF-8");
   printf("wcwidth(0x2022)=%d\n",wcwidth(0x2022));
   for(j=0; j<300; j++)
      for(i=0; i<0x10000; i++)
         wcwidth(i);
   return 0;
}

-- 
   Alexander.
diff --git a/m4/wcwidth.m4 b/m4/wcwidth.m4
index 04a9fc2..7793002 100644
--- a/m4/wcwidth.m4
+++ b/m4/wcwidth.m4
@@ -38,6 +38,7 @@ AC_DEFUN([gl_FUNC_WCWIDTH],
   else
     dnl On MacOS X 10.3, wcwidth(0x0301) (COMBINING ACUTE ACCENT) returns 1.
     dnl On OSF/1 5.1, wcwidth(0x200B) (ZERO WIDTH SPACE) returns 1.
+    dnl On Solaris 8, wcwidth(0x2022) (BULLET) returns 2.
     dnl This leads to bugs in 'ls' (coreutils).
     AC_CACHE_CHECK([whether wcwidth works reasonably in UTF-8 locales],
       [gl_cv_func_wcwidth_works],
@@ -64,7 +65,7 @@ int wcwidth (int);
 int main ()
 {
   if (setlocale (LC_ALL, "fr_FR.UTF-8") != NULL)
-    if (wcwidth (0x0301) > 0 || wcwidth (0x200B) > 0)
+    if (wcwidth (0x0301) > 0 || wcwidth (0x200B) > 0 || wcwidth(0x2022) != 1)
       return 1;
   return 0;
 }], [gl_cv_func_wcwidth_works=yes], [gl_cv_func_wcwidth_works=no],

Reply via email to