Hello, I confirmed that bison test 127. Tabulations and multibyte characters (for Maxwell's equations) passed with the patch for m4/wcwidth.m4.
Regards, --- Kiyoshi ----- Original Message ----- > From: Bruno Haible <br...@clisp.org> > To: bug-gnulib@gnu.org > Cc: Akim Demaille <akim.demai...@gmail.com>; Kiyoshi KANAZAWA > <yoi_no_myou...@yahoo.co.jp> > Date: 2019/5/5, Sun 20:35 > Subject: Re: mbswidth "failure" on Solaris > > Hi, > >> > 15 | e: {∇⃗×𝐸⃗ = -∂𝐵⃗/∂t} >> > - | ^~~~~~~~~~~~~~ >> > + | ^~~~~~~~~~~~~~~~~ > > Indeed, mbswidth seems to have returned 3 more columns. > >> The error (three more columns than expected) seems to indicate something >> related to the combining arrow. > > No. The issue comes from the math symbols. The following test programs shows > it: > > #include <config.h> > #include <stdio.h> > #include <locale.h> > #include <wchar.h> > #include "mbswidth.h" > int main () > { > setlocale (LC_ALL, "en_US.UTF-8"); > printf ("%d\n", (int) mbswidth ("{∇⃗×𝐸⃗ = > -∂𝐵⃗/∂t}",0)); // 14 vs 17 > printf ("%d\n", wcwidth (0x2207)); // 1 vs. 2 > printf ("%d\n", wcwidth (0x20D7)); // 0 > printf ("%d\n", wcwidth (0x00D7)); // 1 > printf ("%d\n", wcwidth (0x1D438)); // 1 > printf ("%d\n", wcwidth (0x2202)); // 1 vs. 2 > printf ("%d\n", wcwidth (0x1D435)); // 1 > } > > The following patch should fix it. > > The patch changes the behaviour of wcwidth(0x2202) for UTF-8 locales. > It would be possible to limit the change to the non-East-Asian UTF-8 > locales (by using the function uc_locale_language() and testing > whether its result is not one of "zh", "ja", > "ko"), but glibc does not > do this (it uses the same width across all UTF-8 locales), therefore > I'm not doing it here either. > > > 2019-05-05 Bruno Haible <br...@clisp.org> > > wcwidth: Ensure width 1, not 2, for ambiguous characters. > Reported by Kiyoshi KANAZAWA <yoi_no_myou...@yahoo.co.jp> > via Akim Demaille <akim.demai...@gmail.com>. > * m4/wcwidth.m4 (gl_FUNC_WCWIDTH): Check the width of U+2202. Use an > en_US.UTF-8 locale, since that is more likely to be present than an > fr_FR.UTF-8 locale. > * tests/test-wcwidth.c (main): Check the width of U+2202. > * doc/posix-functions/wcwidth.texi: Mention the issue. > > diff --git a/m4/wcwidth.m4 b/m4/wcwidth.m4 > index 3952fd2..e9b5bf4 100644 > --- a/m4/wcwidth.m4 > +++ b/m4/wcwidth.m4 > @@ -1,4 +1,4 @@ > -# wcwidth.m4 serial 28 > +# wcwidth.m4 serial 29 > dnl Copyright (C) 2006-2019 Free Software Foundation, Inc. > dnl This file is free software; the Free Software Foundation > dnl gives unlimited permission to copy and/or distribute it, > @@ -54,6 +54,8 @@ AC_DEFUN([gl_FUNC_WCWIDTH], > dnl On OSF/1 5.1, wcwidth(0x200B) (ZERO WIDTH SPACE) returns 1. > dnl On OpenBSD 5.8, wcwidth(0xFF1A) (FULLWIDTH COLON) returns 0. > dnl This leads to bugs in 'ls' (coreutils). > + dnl On Solaris 11.4, wcwidth(0x2202) (PARTIAL DIFFERENTIAL) returns 2, > + dnl even in Western locales. > AC_CACHE_CHECK([whether wcwidth works reasonably in UTF-8 locales], > [gl_cv_func_wcwidth_works], > [ > @@ -80,7 +82,7 @@ int wcwidth (int); > int main () > { > int result = 0; > - if (setlocale (LC_ALL, "fr_FR.UTF-8") != NULL) > + if (setlocale (LC_ALL, "en_US.UTF-8") != NULL) > { > if (wcwidth (0x0301) > 0) > result |= 1; > @@ -90,6 +92,8 @@ int main () > result |= 4; > if (wcwidth (0xFF1A) == 0) > result |= 8; > + if (wcwidth (0x2202) > 1) > + result |= 16; > } > return result; > }]])], > diff --git a/tests/test-wcwidth.c b/tests/test-wcwidth.c > index eb7bdd2..8e9cea3 100644 > --- a/tests/test-wcwidth.c > +++ b/tests/test-wcwidth.c > @@ -72,6 +72,22 @@ main () > ASSERT (wcwidth (0x200B) == 0); > ASSERT (wcwidth (0xFEFF) <= 0); > > + /* Test width of some math symbols. > + U+2202 is marked as having ambiguous width (A) in EastAsianWidth.txt > + (see <https://www.unicode.org/Public/12.0.0/ucd/EastAsianWidth.txt >> ). > + The Unicode Standard Annex 11 > + <https://www.unicode.org/reports/tr11/tr11-36.html > > + says > + "Ambiguous characters behave like wide or narrow characters > + depending on the context (language tag, script identification, > + associated font, source of data, or explicit markup; all can > + provide the context). If the context cannot be established > + reliably, they should be treated as narrow characters by > default." > + For wcwidth(), the only available context information is the locale. > + "fr_FR.UTF-8" is a Western locale, not an East Asian locale, > therefore > + U+2202 should be treated like a narrow character. */ > + ASSERT (wcwidth (0x2202) == 1); > + > /* Test width of some CJK characters. */ > ASSERT (wcwidth (0x3000) == 2); > ASSERT (wcwidth (0xB250) == 2); > diff --git a/doc/posix-functions/wcwidth.texi > b/doc/posix-functions/wcwidth.texi > index 741be8e..ecdf758 100644 > --- a/doc/posix-functions/wcwidth.texi > +++ b/doc/posix-functions/wcwidth.texi > @@ -18,6 +18,10 @@ glibc 2.8. > This function handles combining characters in UTF-8 locales incorrectly on > some > platforms: > Mac OS X 10.3, OpenBSD 5.8. > +@item > +This function returns 2 for characters with ambiguous east asian width, even > in > +Western locales, on some platforms: > +Solaris 11.4. > @end itemize > > Portability problems not fixed by Gnulib: >