Re: wcwidth replacement problems

Bruno Haible Tue, 26 Aug 2008 00:33:10 -0700

Alexander V. Lukyanov wrote:
> > (Giving the BULLET a width of 2 is a bit strange, but not really wrong.)
> 
> Well, it does not seem to match current xterm behavior, and thus leads to
> strange visual results. I don't know, maybe it is an xterm problem, but the
> easiest way was to substitute wcwidth.


Probably the Solaris wcwidth is made to match some Japanese terminal
emulators, rather than xterm? In such terminal emulators, many characters
that have width 1 in xterm are represented with width 2.

U+2022 (BULLET) is designated as "ambiguous width" in Unicode 5.0.0
(ftp.unicode.org ArchiveVersions/5.0.0/ucd/extracted/DerivedEastAsianWidth.txt)
therefore I don't want to consider Solaris wrong here. You have to understand
that wcwidth is only an approximation because different terminal emulators
behave differently.

> > > BTW, why not use this one: http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c ?
> > > It's public domain.
> > 
> > It has also its bugs [1]. Additionally, it's slower because it uses binary
> > search rather than immediate table accesses.
> 
> Let's measure it.
> 
> $ time ./wcwidth-solaris 
> wcwidth(0x2022)=2
> 
> real    0m2.205s
> user    0m2.200s
> sys     0m0.000s
> 
> $ time ./wcwidth-rpl 
> wcwidth(0x2022)=1
> 
> real    0m55.477s
> user    0m55.350s
> sys     0m0.000s
> 
> $ time ./wcwidth-mk 
> wcwidth(0x2022)=1
> 
> real    0m1.944s
> user    0m1.940s
> sys     0m0.010s

This is not a fair comparison: wcwidth-mk works only in UTF-8 locales,
whereas wcwidth() from the system and from gnulib return the right result
in all locales. The test whether the locale encoding is UTF-8 is precisely
what takes up most time in the gnulib replacement.

Bruno

Re: wcwidth replacement problems

Reply via email to