Hi, Alexander V. Lukyanov wrote: > I'm trying to use wcwith replacement on Solaris 8 and have noticed some > problems. At first, the test for replacement did not detect the problem and > thus did not replace the system function (patch for this is attached).
The comment in your diff says: > + dnl On Solaris 8, wcwidth(0x2022) (BULLET) returns -1. This is not the case for me: $ uname -srm SunOS 5.8 sun4u $ cat foo.c #include <locale.h> #include <stdio.h> #include <wchar.h> int main () { if (setlocale (LC_ALL, "") == NULL) { printf ("bad locale\n"); exit (1); } printf ("wcwidth (0x00AB) = %d\n", wcwidth (0x00AB)); printf ("wcwidth (0x00BB) = %d\n", wcwidth (0x00BB)); printf ("wcwidth (0x2022) = %d\n", wcwidth (0x2022)); printf ("wcwidth (0xd856) = %d\n", wcwidth (0xd856)); return 0; } $ export LC_ALL=fr_FR.UTF-8 $ ./a.out wcwidth (0x00AB) = 1 wcwidth (0x00BB) = 1 wcwidth (0x2022) = 2 wcwidth (0xd856) = -1 Which looks all fine. (Giving the BULLET a width of 2 is a bit strange, but not really wrong.) Can you show the results of the same test program on your Solaris 8 machine? > Then I have noticed that the replacement function is slow Correct. Do you have suggestions for speeding up the replacement function? > and broken. > > At least rpl_wcwidth(0x00AB) returns 0, but it should return 1 for the > character. > 0x00AB is LEFT-POINTING DOUBLE ANGLE QUOTATION MARK. Oops, right. I'm fixing it through the attached patch, and adding an additional unit test. > The slowness is probably caused by checking the charset string every time > wcwidth is called. I'm not sure which way to fix it would be correct, probably > caching the check result will help. When would the cache be invalidated? You cannot hook into setlocale(). > BTW, why not use this one: http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c ? > It's public domain. It has also its bugs [1]. Additionally, it's slower because it uses binary search rather than immediate table accesses. Thanks for the reports! Bruno [1] http://mail.nl.linux.org/linux-utf8/2007-07/msg00000.html 2008-08-24 Bruno Haible <[EMAIL PROTECTED]> Fix uc_width(0x00AB) bug, introduced on 2007-07-08. * lib/uniwidth/width.c (nonspacing_table_data): Set bit for 0x00AD, not 0x00AB. Reported by Alexander V. Lukyanov <[EMAIL PROTECTED]>. --- lib/uniwidth/width.c.orig 2008-08-24 12:26:12.000000000 +0200 +++ lib/uniwidth/width.c 2008-08-24 11:47:40.000000000 +0200 @@ -1,5 +1,5 @@ /* Determine display width of Unicode character. - Copyright (C) 2001-2002, 2006-2007 Free Software Foundation, Inc. + Copyright (C) 2001-2002, 2006-2008 Free Software Foundation, Inc. Written by Bruno Haible <[EMAIL PROTECTED]>, 2002. This program is free software: you can redistribute it and/or modify it @@ -36,7 +36,7 @@ /* 0x0000-0x01ff */ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00, /* 0x0000-0x003f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80, /* 0x0040-0x007f */ - 0xff, 0xff, 0xff, 0xff, 0x00, 0x08, 0x00, 0x00, /* 0x0080-0x00bf */ + 0xff, 0xff, 0xff, 0xff, 0x00, 0x20, 0x00, 0x00, /* 0x0080-0x00bf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x00c0-0x00ff */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x0100-0x013f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x0140-0x017f */