The function c32rtomb() is like wcrtomb(), except that it takes a 32-bit wide character (char32_t) as argument, not a wchar_t.
While implementing this module, I noted a mistake in the 'mbrtoc32' module: It assumed that when wchar_t is 32-bit and mbrtoc32() exists in libc, mbrtoc32() is equivalent to mbrtowc(); in other words, that the char32_t encoding and the wchar_t encoding of the same multibyte sequence are the same. But this is not the case! On FreeBSD 12 and Solaris 11.4, the two encodings are different. The FreeBSD 12 wchar_t encoding is apparently based on ISO 2022 (very old). The fix is to use mbrtoc32() on platforms where this is possible, namely on FreeBSD. On Solaris 11.4 and native Windows, however, it is not good to use the system's mbrtoc32() because it refuses to convert some multibyte sequences that mbrtowc() supports! So, we end up using the system's mbrtoc32() and c32rtomb() functions on - glibc, - FreeBSD, - AIX, and not using them on - Solaris 11.4, - mingw, - MSVC. 2020-01-08 Bruno Haible <br...@clisp.org> mbrtoc32: Use the system's mbrtoc32 if it exists and basically works. * m4/mbrtoc32.m4 (gl_MBRTOC32_SANITYCHECK): New macro. (gl_FUNC_MBRTOC32): Require it. Set REPLACE_MBRTOC32 if mbrtoc32 exists but is not working. * lib/mbrtoc32.c: Include hard-locale.h, <locale.h>. (mbrtoc32): If the char32_t encoding and the wchar_t encoding may differ, use the system's mbrtoc32, adding workarounds. * modules/mbrtoc32 (Depends-on): Add hard-locale. * doc/posix-functions/mbrtoc32.texi: Mention the Solaris and native Windows problem. * lib/btoc32.c: Include <stdio.h>, <string.h>. (btoc32): If the char32_t encoding and the wchar_t encoding may differ, use mbrtoc32, not btowc. * modules/btoc32 (Depends-on): Add mbrtoc32. * lib/mbsrtoc32s.c (mbsrtoc32s): If the char32_t encoding and the wchar_t encoding may differ, use mbrtoc32, not mbsrtowcs. * modules/mbsrtoc32s (Depends-on): Update conditions. (configure.ac): Compile mbsrtoc32s-state.c unconditionally. * lib/mbsnrtoc32s.c (mbsnrtoc32s): If the char32_t encoding and the wchar_t encoding may differ, use mbrtoc32, not mbsnrtowcs. * modules/mbsnrtoc32s (Depends-on): Update conditions. (configure.ac): Compile mbsrtoc32s-state.c unconditionally. 2020-01-08 Bruno Haible <br...@clisp.org> c32rtomb: Add tests. * tests/test-c32rtomb.c: New file, based on tests/test-wcrtomb.c. * tests/test-c32rtomb.sh: New file, based on tests/test-wcrtomb.sh. * tests/test-c32rtomb-w32.c: New file, based on tests/test-wcrtomb-w32.c. * tests/test-c32rtomb-w32-1.sh: New file, based on tests/test-wcrtomb-w32-1.sh. * tests/test-c32rtomb-w32-2.sh: New file, based on tests/test-wcrtomb-w32-2.sh. * tests/test-c32rtomb-w32-3.sh: New file, based on tests/test-wcrtomb-w32-3.sh. * tests/test-c32rtomb-w32-4.sh: New file, based on tests/test-wcrtomb-w32-4.sh. * tests/test-c32rtomb-w32-5.sh: New file, based on tests/test-wcrtomb-w32-5.sh. * tests/test-c32rtomb-w32-6.sh: New file, based on tests/test-wcrtomb-w32-6.sh. * tests/test-c32rtomb-w32-7.sh: New file, based on tests/test-wcrtomb-w32-7.sh. * modules/c32rtomb-tests: New file. c32rtomb: New module. * lib/uchar.in.h (c32rtomb): New declaration. * lib/c32rtomb.c: New file, based on lib/unistr/u8-uctomb-aux.c. * m4/c32rtomb.m4: New file. * m4/uchar.m4 (gl_UCHAR_H): Test whether c32rtomb is declared. (gl_UCHAR_H_DEFAULTS): Initialize GNULIB_C32RTOMB, HAVE_C32RTOMB, REPLACE_C32RTOMB. * modules/uchar (Makefile.am): Substitute GNULIB_C32RTOMB, HAVE_C32RTOMB, REPLACE_C32RTOMB. * modules/c32rtomb: New file. * tests/test-uchar-c++.cc: Test the signature of c32rtomb. * doc/posix-functions/c32rtomb.texi: Document the new module. * doc/posix-functions/wcrtomb.texi: Mention the new module. 2020-01-08 Bruno Haible <br...@clisp.org> c32tob: Make consistent with mbrtoc32. * lib/c32tob.c: Include <stdio.h>, <string.h>, <wchar.h>. (c32tob): If the char32_t encoding and the wchar_t encoding may differ, use c32rtomb, not wctob. * modules/c32tob (Files): Add m4/mbrtoc32.m4. (Depends-on): Add c32rtomb. (configure.ac): Require gl_MBRTOC32_SANITYCHECK.
>From 9be236d67f3d78235c5cbe4381c5dd7b3cddb179 Mon Sep 17 00:00:00 2001 From: Bruno Haible <br...@clisp.org> Date: Thu, 9 Jan 2020 01:47:17 +0100 Subject: [PATCH 1/4] mbrtoc32: Use the system's mbrtoc32 if it exists and basically works. * m4/mbrtoc32.m4 (gl_MBRTOC32_SANITYCHECK): New macro. (gl_FUNC_MBRTOC32): Require it. Set REPLACE_MBRTOC32 if mbrtoc32 exists but is not working. * lib/mbrtoc32.c: Include hard-locale.h, <locale.h>. (mbrtoc32): If the char32_t encoding and the wchar_t encoding may differ, use the system's mbrtoc32, adding workarounds. * modules/mbrtoc32 (Depends-on): Add hard-locale. * doc/posix-functions/mbrtoc32.texi: Mention the Solaris and native Windows problem. * lib/btoc32.c: Include <stdio.h>, <string.h>. (btoc32): If the char32_t encoding and the wchar_t encoding may differ, use mbrtoc32, not btowc. * modules/btoc32 (Depends-on): Add mbrtoc32. * lib/mbsrtoc32s.c (mbsrtoc32s): If the char32_t encoding and the wchar_t encoding may differ, use mbrtoc32, not mbsrtowcs. * modules/mbsrtoc32s (Depends-on): Update conditions. (configure.ac): Compile mbsrtoc32s-state.c unconditionally. * lib/mbsnrtoc32s.c (mbsnrtoc32s): If the char32_t encoding and the wchar_t encoding may differ, use mbrtoc32, not mbsnrtowcs. * modules/mbsnrtoc32s (Depends-on): Update conditions. (configure.ac): Compile mbsrtoc32s-state.c unconditionally. --- ChangeLog | 25 ++++++++++ doc/posix-functions/mbrtoc32.texi | 4 ++ lib/btoc32.c | 20 ++++++++ lib/mbrtoc32.c | 53 ++++++++++++++------ lib/mbsnrtoc32s.c | 4 +- lib/mbsrtoc32s.c | 4 +- m4/mbrtoc32.m4 | 102 +++++++++++++++++++++++++++++++++++++- modules/btoc32 | 1 + modules/mbrtoc32 | 1 + modules/mbsnrtoc32s | 10 ++-- modules/mbsrtoc32s | 8 ++- 11 files changed, 204 insertions(+), 28 deletions(-) diff --git a/ChangeLog b/ChangeLog index ea35e7e..4b5a419 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,28 @@ +2020-01-08 Bruno Haible <br...@clisp.org> + + mbrtoc32: Use the system's mbrtoc32 if it exists and basically works. + * m4/mbrtoc32.m4 (gl_MBRTOC32_SANITYCHECK): New macro. + (gl_FUNC_MBRTOC32): Require it. Set REPLACE_MBRTOC32 if mbrtoc32 exists + but is not working. + * lib/mbrtoc32.c: Include hard-locale.h, <locale.h>. + (mbrtoc32): If the char32_t encoding and the wchar_t encoding may + differ, use the system's mbrtoc32, adding workarounds. + * modules/mbrtoc32 (Depends-on): Add hard-locale. + * doc/posix-functions/mbrtoc32.texi: Mention the Solaris and native + Windows problem. + * lib/btoc32.c: Include <stdio.h>, <string.h>. + (btoc32): If the char32_t encoding and the wchar_t encoding may differ, + use mbrtoc32, not btowc. + * modules/btoc32 (Depends-on): Add mbrtoc32. + * lib/mbsrtoc32s.c (mbsrtoc32s): If the char32_t encoding and the + wchar_t encoding may differ, use mbrtoc32, not mbsrtowcs. + * modules/mbsrtoc32s (Depends-on): Update conditions. + (configure.ac): Compile mbsrtoc32s-state.c unconditionally. + * lib/mbsnrtoc32s.c (mbsnrtoc32s): If the char32_t encoding and the + wchar_t encoding may differ, use mbrtoc32, not mbsnrtowcs. + * modules/mbsnrtoc32s (Depends-on): Update conditions. + (configure.ac): Compile mbsrtoc32s-state.c unconditionally. + 2020-01-07 Bruno Haible <br...@clisp.org> wcrtomb: Make multithread-safe, except possibly on IRIX. diff --git a/doc/posix-functions/mbrtoc32.texi b/doc/posix-functions/mbrtoc32.texi index 1aa15a3..9789bef 100644 --- a/doc/posix-functions/mbrtoc32.texi +++ b/doc/posix-functions/mbrtoc32.texi @@ -17,6 +17,10 @@ glibc 2.23. This function returns 0 instead of @code{(size_t) -2} when the input is empty: glibc 2.19. +@item +This function does not recognize multibyte sequences that @code{mbrtowc} +recognizes on some platforms: +Solaris 11.4, mingw, MSVC 14. @end itemize Portability problems not fixed by Gnulib: diff --git a/lib/btoc32.c b/lib/btoc32.c index 8b27875..d8ce087 100644 --- a/lib/btoc32.c +++ b/lib/btoc32.c @@ -21,10 +21,30 @@ /* Specification. */ #include <uchar.h> +#include <stdio.h> +#include <string.h> + wint_t btoc32 (int c) { +#if HAVE_WORKING_MBRTOC32 && !defined __GLIBC__ + /* The char32_t encoding of a multibyte character may be different than its + wchar_t encoding. */ + if (c != EOF) + { + mbstate_t state; + char s[1]; + char32_t wc; + + memset (&state, '\0', sizeof (mbstate_t)); + s[0] = (unsigned char) c; + if (mbrtoc32 (&wc, s, 1, &state) <= 1) + return wc; + } + return WEOF; +#else /* In all known locale encodings, unibyte characters correspond only to characters in the BMP. */ return btowc (c); +#endif } diff --git a/lib/mbrtoc32.c b/lib/mbrtoc32.c index f2cf71e..facf28b 100644 --- a/lib/mbrtoc32.c +++ b/lib/mbrtoc32.c @@ -24,13 +24,13 @@ #include <errno.h> #include <stdlib.h> -# ifndef FALLTHROUGH -# if __GNUC__ < 7 -# define FALLTHROUGH ((void) 0) -# else -# define FALLTHROUGH __attribute__ ((__fallthrough__)) -# endif +#ifndef FALLTHROUGH +# if __GNUC__ < 7 +# define FALLTHROUGH ((void) 0) +# else +# define FALLTHROUGH __attribute__ ((__fallthrough__)) # endif +#endif #if GNULIB_defined_mbstate_t /* AIX, IRIX */ /* Implement mbrtoc32() on top of mbtowc() for the non-UTF-8 locales @@ -74,17 +74,23 @@ mbrtoc32 (char32_t *pwc, const char *s, size_t n, mbstate_t *ps) #else /* glibc, macOS, FreeBSD, NetBSD, OpenBSD, HP-UX, Solaris, Cygwin, mingw, MSVC, Minix, Android */ -/* Implement mbrtoc32() based on mbrtowc(). */ +/* Implement mbrtoc32() based on the original mbrtoc32() or on mbrtowc(). */ # include <wchar.h> # include "localcharset.h" # include "streq.h" +# if MBRTOC32_IN_C_LOCALE_MAYBE_EILSEQ +# include "hard-locale.h" +# include <locale.h> +# endif + static mbstate_t internal_state; size_t mbrtoc32 (char32_t *pwc, const char *s, size_t n, mbstate_t *ps) +# undef mbrtoc32 { /* It's simpler to handle the case s == NULL upfront, than to worry about this case later, before every test of pwc and n. */ @@ -103,7 +109,31 @@ mbrtoc32 (char32_t *pwc, const char *s, size_t n, mbstate_t *ps) if (ps == NULL) ps = &internal_state; -# if _GL_LARGE_CHAR32_T +# if HAVE_WORKING_MBRTOC32 + /* mbrtoc32() may produce different values for wc than mbrtowc(). Therefore + use mbrtoc32(). */ + +# if defined _WIN32 && !defined __CYGWIN__ + char32_t wc; + size_t ret = mbrtoc32 (&wc, s, n, ps); + if (ret < (size_t) -2 && pwc != NULL) + *pwc = wc; +# else + size_t ret = mbrtoc32 (pwc, s, n, ps); +# endif + +# if MBRTOC32_IN_C_LOCALE_MAYBE_EILSEQ + if ((size_t) -2 <= ret && n != 0 && ! hard_locale (LC_CTYPE)) + { + if (pwc != NULL) + *pwc = (unsigned char) *s; + return 1; + } +# endif + + return ret; + +# elif _GL_LARGE_CHAR32_T /* Special-case all encodings that may produce wide character values > WCHAR_MAX. */ @@ -209,12 +239,7 @@ mbrtoc32 (char32_t *pwc, const char *s, size_t n, mbstate_t *ps) # else - /* char32_t and wchar_t are equivalent. - Two implementations are possible: - - We can call the original mbrtoc32 (if it exists) and handle - MBRTOC32_IN_C_LOCALE_MAYBE_EILSEQ. - - We can call mbrtowc. - The latter is simpler. */ + /* char32_t and wchar_t are equivalent. Use mbrtowc(). */ wchar_t wc; size_t ret = mbrtowc (&wc, s, n, ps); if (ret < (size_t) -2 && pwc != NULL) diff --git a/lib/mbsnrtoc32s.c b/lib/mbsnrtoc32s.c index 7ba0415..c0f6e1f 100644 --- a/lib/mbsnrtoc32s.c +++ b/lib/mbsnrtoc32s.c @@ -22,7 +22,9 @@ #include <wchar.h> -#if _GL_LARGE_CHAR32_T +#if (HAVE_WORKING_MBRTOC32 && !defined __GLIBC__) || _GL_LARGE_CHAR32_T +/* The char32_t encoding of a multibyte character may be different than its + wchar_t encoding, or char32_t is wider than wchar_t. */ /* For Cygwin >= 1.7 it would be possible to speed this up a bit by cutting the source into chunks, calling mbsnrtowcs on a chunk, then u16_to_u32 on diff --git a/lib/mbsrtoc32s.c b/lib/mbsrtoc32s.c index 432ffaf..8887ddf 100644 --- a/lib/mbsrtoc32s.c +++ b/lib/mbsrtoc32s.c @@ -22,7 +22,9 @@ #include <wchar.h> -#if _GL_LARGE_CHAR32_T +#if (HAVE_WORKING_MBRTOC32 && !defined __GLIBC__) || _GL_LARGE_CHAR32_T +/* The char32_t encoding of a multibyte character may be different than its + wchar_t encoding, or char32_t is wider than wchar_t. */ # include <errno.h> # include <limits.h> diff --git a/m4/mbrtoc32.m4 b/m4/mbrtoc32.m4 index 5039fc7..3dee900 100644 --- a/m4/mbrtoc32.m4 +++ b/m4/mbrtoc32.m4 @@ -1,4 +1,4 @@ -# mbrtoc32.m4 serial 1 +# mbrtoc32.m4 serial 2 dnl Copyright (C) 2014-2020 Free Software Foundation, Inc. dnl This file is free software; the Free Software Foundation dnl gives unlimited permission to copy and/or distribute it, @@ -11,6 +11,8 @@ AC_DEFUN([gl_FUNC_MBRTOC32], AC_REQUIRE([AC_TYPE_MBSTATE_T]) gl_MBSTATE_T_BROKEN + AC_REQUIRE([gl_MBRTOC32_SANITYCHECK]) + AC_CHECK_FUNCS_ONCE([mbrtoc32]) if test $ac_cv_func_mbrtoc32 = no; then HAVE_MBRTOC32=0 @@ -35,6 +37,9 @@ AC_DEFUN([gl_FUNC_MBRTOC32], ;; esac fi + if test $HAVE_WORKING_MBRTOC32 = 0; then + REPLACE_MBRTOC32=1 + fi fi ]) @@ -111,6 +116,101 @@ AC_DEFUN([gl_MBRTOC32_C_LOCALE], ]) ]) +dnl Test whether mbrtoc32 works not worse than mbrtowc. +dnl Result is HAVE_WORKING_MBRTOC32. + +AC_DEFUN([gl_MBRTOC32_SANITYCHECK], +[ + AC_REQUIRE([AC_PROG_CC]) + AC_CHECK_FUNCS_ONCE([mbrtoc32]) + AC_REQUIRE([gt_LOCALE_FR]) + AC_REQUIRE([gt_LOCALE_ZH_CN]) + AC_REQUIRE([AC_CANONICAL_HOST]) dnl for cross-compiles + if test $ac_cv_func_mbrtoc32 = no; then + HAVE_WORKING_MBRTOC32=0 + else + AC_CACHE_CHECK([whether mbrtoc32 works as well as mbrtowc], + [gl_cv_func_mbrtoc32_sanitycheck], + [ + dnl Initial guess, used when cross-compiling or when no suitable locale + dnl is present. +changequote(,)dnl + case "$host_os" in + # Guess no on Solaris, native Windows. + solaris* | mingw*) gl_cv_func_mbrtoc32_sanitycheck="guessing no" ;; + # Guess yes otherwise. + *) gl_cv_func_mbrtoc32_sanitycheck="guessing yes" ;; + esac +changequote([,])dnl + if test $LOCALE_FR != none || test $LOCALE_ZH_CN != none; then + AC_RUN_IFELSE( + [AC_LANG_SOURCE([[ +#include <locale.h> +#include <stdlib.h> +#include <string.h> +/* Tru64 with Desktop Toolkit C has a bug: <stdio.h> must be included before + <wchar.h>. + BSD/OS 4.0.1 has a bug: <stddef.h>, <stdio.h> and <time.h> must be + included before <wchar.h>. */ +#include <stddef.h> +#include <stdio.h> +#include <time.h> +#include <wchar.h> +#include <uchar.h> +int main () +{ + int result = 0; + /* This fails on native Windows: + mbrtoc32 returns (size_t)-1. + mbrtowc returns 1 (correct). */ + if (setlocale (LC_ALL, "$LOCALE_FR") != NULL) + { + mbstate_t state; + wchar_t wc = (wchar_t) 0xBADFACE; + memset (&state, '\0', sizeof (mbstate_t)); + if (mbrtowc (&wc, "\374", 1, &state) == 1) + { + char32_t c32 = (wchar_t) 0xBADFACE; + memset (&state, '\0', sizeof (mbstate_t)); + if (mbrtoc32 (&c32, "\374", 1, &state) != 1) + result |= 1; + } + } + /* This fails on Solaris 11.4: + mbrtoc32 returns (size_t)-1. + mbrtowc returns 4 (correct). */ + if (setlocale (LC_ALL, "$LOCALE_ZH_CN") != NULL) + { + mbstate_t state; + wchar_t wc = (wchar_t) 0xBADFACE; + memset (&state, '\0', sizeof (mbstate_t)); + if (mbrtowc (&wc, "\224\071\375\067", 4, &state) == 4) + { + char32_t c32 = (wchar_t) 0xBADFACE; + memset (&state, '\0', sizeof (mbstate_t)); + if (mbrtoc32 (&c32, "\224\071\375\067", 4, &state) != 4) + result |= 2; + } + } + return result; +}]])], + [gl_cv_func_mbrtoc32_sanitycheck=yes], + [gl_cv_func_mbrtoc32_sanitycheck=no], + [:]) + fi + ]) + case "$gl_cv_func_mbrtoc32_sanitycheck" in + *yes) + HAVE_WORKING_MBRTOC32=1 + AC_DEFINE([HAVE_WORKING_MBRTOC32], [1], + [Define if the mbrtoc32 function basically works.]) + ;; + *) HAVE_WORKING_MBRTOC32=0 ;; + esac + fi + AC_SUBST([HAVE_WORKING_MBRTOC32]) +]) + # Prerequisites of lib/mbrtoc32.c and lib/lc-charset-dispatch.c. AC_DEFUN([gl_PREREQ_MBRTOC32], [ : diff --git a/modules/btoc32 b/modules/btoc32 index 5e5d4a9..caf36d3 100644 --- a/modules/btoc32 +++ b/modules/btoc32 @@ -6,6 +6,7 @@ lib/btoc32.c Depends-on: uchar +mbrtoc32 btowc configure.ac: diff --git a/modules/mbrtoc32 b/modules/mbrtoc32 index 2575394..cf41846 100644 --- a/modules/mbrtoc32 +++ b/modules/mbrtoc32 @@ -18,6 +18,7 @@ m4/visibility.m4 Depends-on: uchar +hard-locale [{ test $HAVE_MBRTOC32 = 0 || test $REPLACE_MBRTOC32 = 1; } && test $REPLACE_MBSTATE_T = 0] mbrtowc [{ test $HAVE_MBRTOC32 = 0 || test $REPLACE_MBRTOC32 = 1; } && test $REPLACE_MBSTATE_T = 0] localcharset [test $HAVE_MBRTOC32 = 0 || test $REPLACE_MBRTOC32 = 1] streq [test $HAVE_MBRTOC32 = 0 || test $REPLACE_MBRTOC32 = 1] diff --git a/modules/mbsnrtoc32s b/modules/mbsnrtoc32s index 44784d8..ac464a8 100644 --- a/modules/mbsnrtoc32s +++ b/modules/mbsnrtoc32s @@ -10,16 +10,14 @@ Depends-on: uchar wchar verify -mbrtoc32 [test $SMALL_WCHAR_T = 1] -minmax [test $SMALL_WCHAR_T = 1] -strnlen1 [test $SMALL_WCHAR_T = 1] +mbrtoc32 +minmax +strnlen1 mbsnrtowcs [test $SMALL_WCHAR_T = 0] configure.ac: AC_REQUIRE([gl_UCHAR_H]) -if test $SMALL_WCHAR_T = 1; then - AC_LIBOBJ([mbsrtoc32s-state]) -fi +AC_LIBOBJ([mbsrtoc32s-state]) gl_UCHAR_MODULE_INDICATOR([mbsnrtoc32s]) Makefile.am: diff --git a/modules/mbsrtoc32s b/modules/mbsrtoc32s index e7e5ee2..64892cf 100644 --- a/modules/mbsrtoc32s +++ b/modules/mbsrtoc32s @@ -10,15 +10,13 @@ Depends-on: uchar wchar verify -mbrtoc32 [test $SMALL_WCHAR_T = 1] -strnlen1 [test $SMALL_WCHAR_T = 1] +mbrtoc32 +strnlen1 mbsrtowcs [test $SMALL_WCHAR_T = 0] configure.ac: AC_REQUIRE([gl_UCHAR_H]) -if test $SMALL_WCHAR_T = 1; then - AC_LIBOBJ([mbsrtoc32s-state]) -fi +AC_LIBOBJ([mbsrtoc32s-state]) gl_UCHAR_MODULE_INDICATOR([mbsrtoc32s]) Makefile.am: -- 2.7.4
>From 4ec96253823bde7488bfee4ee5d890792d6b555b Mon Sep 17 00:00:00 2001 From: Bruno Haible <br...@clisp.org> Date: Thu, 9 Jan 2020 01:56:35 +0100 Subject: [PATCH 2/4] c32rtomb: New module. * lib/uchar.in.h (c32rtomb): New declaration. * lib/c32rtomb.c: New file, based on lib/unistr/u8-uctomb-aux.c. * m4/c32rtomb.m4: New file. * m4/uchar.m4 (gl_UCHAR_H): Test whether c32rtomb is declared. (gl_UCHAR_H_DEFAULTS): Initialize GNULIB_C32RTOMB, HAVE_C32RTOMB, REPLACE_C32RTOMB. * modules/uchar (Makefile.am): Substitute GNULIB_C32RTOMB, HAVE_C32RTOMB, REPLACE_C32RTOMB. * modules/c32rtomb: New file. * tests/test-uchar-c++.cc: Test the signature of c32rtomb. * doc/posix-functions/c32rtomb.texi: Document the new module. * doc/posix-functions/wcrtomb.texi: Mention the new module. --- ChangeLog | 16 +++++ doc/posix-functions/c32rtomb.texi | 11 ++-- doc/posix-functions/wcrtomb.texi | 7 ++- lib/c32rtomb.c | 124 ++++++++++++++++++++++++++++++++++++++ lib/uchar.in.h | 25 ++++++++ m4/c32rtomb.m4 | 55 +++++++++++++++++ m4/uchar.m4 | 7 ++- modules/c32rtomb | 32 ++++++++++ modules/uchar | 3 + tests/test-uchar-c++.cc | 5 ++ 10 files changed, 277 insertions(+), 8 deletions(-) create mode 100644 lib/c32rtomb.c create mode 100644 m4/c32rtomb.m4 create mode 100644 modules/c32rtomb diff --git a/ChangeLog b/ChangeLog index 4b5a419..3ad99ff 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,5 +1,21 @@ 2020-01-08 Bruno Haible <br...@clisp.org> + c32rtomb: New module. + * lib/uchar.in.h (c32rtomb): New declaration. + * lib/c32rtomb.c: New file, based on lib/unistr/u8-uctomb-aux.c. + * m4/c32rtomb.m4: New file. + * m4/uchar.m4 (gl_UCHAR_H): Test whether c32rtomb is declared. + (gl_UCHAR_H_DEFAULTS): Initialize GNULIB_C32RTOMB, HAVE_C32RTOMB, + REPLACE_C32RTOMB. + * modules/uchar (Makefile.am): Substitute GNULIB_C32RTOMB, + HAVE_C32RTOMB, REPLACE_C32RTOMB. + * modules/c32rtomb: New file. + * tests/test-uchar-c++.cc: Test the signature of c32rtomb. + * doc/posix-functions/c32rtomb.texi: Document the new module. + * doc/posix-functions/wcrtomb.texi: Mention the new module. + +2020-01-08 Bruno Haible <br...@clisp.org> + mbrtoc32: Use the system's mbrtoc32 if it exists and basically works. * m4/mbrtoc32.m4 (gl_MBRTOC32_SANITYCHECK): New macro. (gl_FUNC_MBRTOC32): Require it. Set REPLACE_MBRTOC32 if mbrtoc32 exists diff --git a/doc/posix-functions/c32rtomb.texi b/doc/posix-functions/c32rtomb.texi index 392bbe9..4a1a617 100644 --- a/doc/posix-functions/c32rtomb.texi +++ b/doc/posix-functions/c32rtomb.texi @@ -2,15 +2,18 @@ @section @code{c32rtomb} @findex c32rtomb -Gnulib module: --- +Gnulib module: c32rtomb Portability problems fixed by Gnulib: @itemize +@item +This function is missing on most non-glibc platforms: +glibc 2.15, Mac OS X 10.5, FreeBSD 6.4, NetBSD 5.0, OpenBSD 3.8, Minix 3.1.8, AIX 7.1, HP-UX 11.31, IRIX 6.5, Solaris 11.3, Cygwin, mingw, MSVC 9, Android 4.4. +@item +This function returns 0 when the first argument is NULL in some locales on some platforms: +AIX 7.2. @end itemize Portability problems not fixed by Gnulib: @itemize -@item -This function is missing on most non-glibc platforms: -glibc 2.15, Mac OS X 10.5, FreeBSD 6.4, NetBSD 5.0, OpenBSD 3.8, Minix 3.1.8, AIX 7.1, HP-UX 11.31, IRIX 6.5, Solaris 11.3, Cygwin, mingw, MSVC 9, Android 4.4. @end itemize diff --git a/doc/posix-functions/wcrtomb.texi b/doc/posix-functions/wcrtomb.texi index 232bea4..28b8dfe 100644 --- a/doc/posix-functions/wcrtomb.texi +++ b/doc/posix-functions/wcrtomb.texi @@ -25,6 +25,9 @@ MSVC 14. Portability problems not fixed by Gnulib: @itemize @item -On Windows and 32-bit AIX platforms, @code{wchar_t} is a 16-bit type and therefore cannot -accommodate all Unicode characters. +On Windows and 32-bit AIX platforms, @code{wchar_t} is a 16-bit type and +therefore cannot accommodate all Unicode characters. +However, the ISO C11 function @code{c32rtomb}, provided by Gnulib module +@code{c32rtomb}, operates on 32-bit wide characters and therefore does not have +this limitation. @end itemize diff --git a/lib/c32rtomb.c b/lib/c32rtomb.c new file mode 100644 index 0000000..ba39929 --- /dev/null +++ b/lib/c32rtomb.c @@ -0,0 +1,124 @@ +/* Convert 32-bit wide character to multibyte character. + Copyright (C) 2020 Free Software Foundation, Inc. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <https://www.gnu.org/licenses/>. */ + +/* Written by Bruno Haible <br...@clisp.org>, 2020. */ + +#include <config.h> + +/* Specification. */ +#include <uchar.h> + +#include <errno.h> +#include <wchar.h> + +#include "localcharset.h" +#include "streq.h" + +#ifndef FALLTHROUGH +# if __GNUC__ < 7 +# define FALLTHROUGH ((void) 0) +# else +# define FALLTHROUGH __attribute__ ((__fallthrough__)) +# endif +#endif + +size_t +c32rtomb (char *s, char32_t wc, mbstate_t *ps) +#undef c32rtomb +{ +#if HAVE_WORKING_MBRTOC32 + +# if C32RTOMB_RETVAL_BUG + if (s == NULL) + /* We know the NUL wide character corresponds to the NUL character. */ + return 1; +# endif + + return c32rtomb (s, wc, ps); + +#elif _GL_LARGE_CHAR32_T + + if (s == NULL) + return wcrtomb (NULL, 0, ps); + else + { + /* Special-case all encodings that may produce wide character values + > WCHAR_MAX. */ + const char *encoding = locale_charset (); + if (STREQ_OPT (encoding, "UTF-8", 'U', 'T', 'F', '-', '8', 0, 0, 0, 0)) + { + /* Special-case the UTF-8 encoding. Assume that the wide-character + encoding in a UTF-8 locale is UCS-2 or, equivalently, UTF-16. */ + if (wc < 0x80) + { + s[0] = (unsigned char) wc; + return 1; + } + else + { + int count; + + if (wc < 0x800) + count = 2; + else if (wc < 0x10000) + { + if (wc < 0xd800 || wc >= 0xe000) + count = 3; + else + { + errno = EILSEQ; + return (size_t)(-1); + } + } + else if (wc < 0x110000) + count = 4; + else + { + errno = EILSEQ; + return (size_t)(-1); + } + + switch (count) /* note: code falls through cases! */ + { + case 4: s[3] = 0x80 | (wc & 0x3f); wc = wc >> 6; wc |= 0x10000; + FALLTHROUGH; + case 3: s[2] = 0x80 | (wc & 0x3f); wc = wc >> 6; wc |= 0x800; + FALLTHROUGH; + case 2: s[1] = 0x80 | (wc & 0x3f); wc = wc >> 6; wc |= 0xc0; + /*case 1:*/ s[0] = wc; + } + return count; + } + } + else + { + if ((wchar_t) wc == wc) + return wcrtomb (s, (wchar_t) wc, ps); + else + { + errno = EILSEQ; + return (size_t)(-1); + } + } + } + +#else + + /* char32_t and wchar_t are equivalent. */ + return wcrtomb (s, (wchar_t) wc, ps); + +#endif +} diff --git a/lib/uchar.in.h b/lib/uchar.in.h index 513fa8c..dbbfc30 100644 --- a/lib/uchar.in.h +++ b/lib/uchar.in.h @@ -68,6 +68,31 @@ _GL_CXXALIASWARN (btoc32); #endif +/* Converts a 32-bit wide character to a multibyte character. */ +#if @GNULIB_C32RTOMB@ +# if @REPLACE_C32RTOMB@ +# if !(defined __cplusplus && defined GNULIB_NAMESPACE) +# undef c32rtomb +# define c32rtomb rpl_c32rtomb +# endif +_GL_FUNCDECL_RPL (c32rtomb, size_t, (char *s, char32_t wc, mbstate_t *ps)); +_GL_CXXALIAS_RPL (c32rtomb, size_t, (char *s, char32_t wc, mbstate_t *ps)); +# else +# if !@HAVE_C32RTOMB@ +_GL_FUNCDECL_SYS (c32rtomb, size_t, (char *s, char32_t wc, mbstate_t *ps)); +# endif +_GL_CXXALIAS_SYS (c32rtomb, size_t, (char *s, char32_t wc, mbstate_t *ps)); +# endif +_GL_CXXALIASWARN (c32rtomb); +#elif defined GNULIB_POSIXCHECK +# undef c32rtomb +# if HAVE_RAW_DECL_C32RTOMB +_GL_WARN_ON_USE (mbrtoc32, "c32rtomb is not portable - " + "use gnulib module c32rtomb for portability"); +# endif +#endif + + /* Converts a 32-bit wide character to unibyte character. Returns the single-byte representation of WC if it exists, or EOF otherwise. */ diff --git a/m4/c32rtomb.m4 b/m4/c32rtomb.m4 new file mode 100644 index 0000000..4cf0e4d --- /dev/null +++ b/m4/c32rtomb.m4 @@ -0,0 +1,55 @@ +# c32rtomb.m4 serial 1 +dnl Copyright (C) 2020 Free Software Foundation, Inc. +dnl This file is free software; the Free Software Foundation +dnl gives unlimited permission to copy and/or distribute it, +dnl with or without modifications, as long as this notice is preserved. + +AC_DEFUN([gl_FUNC_C32RTOMB], +[ + AC_REQUIRE([gl_UCHAR_H_DEFAULTS]) + + AC_REQUIRE([gl_MBRTOC32_SANITYCHECK]) + + AC_CHECK_FUNCS_ONCE([c32rtomb]) + if test $ac_cv_func_c32rtomb = no; then + HAVE_C32RTOMB=0 + else + dnl When we override mbrtoc32, redefining the meaning of the char32_t + dnl values, we need to override c32rtomb as well, for consistency. + if test $HAVE_WORKING_MBRTOC32 = 0; then + REPLACE_C32RTOMB=1 + fi + AC_CACHE_CHECK([whether c32rtomb return value is correct], + [gl_cv_func_c32rtomb_retval], + [ + dnl Initial guess, used when cross-compiling. +changequote(,)dnl + case "$host_os" in + # Guess no on AIX. + aix*) gl_cv_func_c32rtomb_retval="guessing no" ;; + # Guess yes otherwise. + *) gl_cv_func_c32rtomb_retval="guessing yes" ;; + esac +changequote([,])dnl + AC_RUN_IFELSE( + [AC_LANG_SOURCE([[ +#include <uchar.h> +int main () +{ + int result = 0; + if (c32rtomb (NULL, 0, NULL) != 1) + result |= 1; + return result; +}]])], + [gl_cv_func_c32rtomb_retval=yes], + [gl_cv_func_c32rtomb_retval=no], + [:]) + ]) + case "$gl_cv_func_c32rtomb_retval" in + *yes) ;; + *) AC_DEFINE([C32RTOMB_RETVAL_BUG], [1], + [Define if the wcrtomb function has an incorrect return value.]) + REPLACE_C32RTOMB=1 ;; + esac + fi +]) diff --git a/m4/uchar.m4 b/m4/uchar.m4 index 0b5c662..be71196 100644 --- a/m4/uchar.m4 +++ b/m4/uchar.m4 @@ -1,4 +1,4 @@ -# uchar.m4 serial 8 +# uchar.m4 serial 9 dnl Copyright (C) 2019-2020 Free Software Foundation, Inc. dnl This file is free software; the Free Software Foundation dnl gives unlimited permission to copy and/or distribute it, @@ -33,7 +33,7 @@ AC_DEFUN_ONCE([gl_UCHAR_H], dnl corresponding gnulib module is not in use, and which is not dnl guaranteed by C11. gl_WARN_ON_USE_PREPARE([[#include <uchar.h> - ]], [mbrtoc32]) + ]], [c32rtomb mbrtoc32]) ]) AC_DEFUN([gl_UCHAR_MODULE_INDICATOR], @@ -48,12 +48,15 @@ AC_DEFUN([gl_UCHAR_MODULE_INDICATOR], AC_DEFUN([gl_UCHAR_H_DEFAULTS], [ GNULIB_BTOC32=0; AC_SUBST([GNULIB_BTOC32]) + GNULIB_C32RTOMB=0; AC_SUBST([GNULIB_C32RTOMB]) GNULIB_C32TOB=0; AC_SUBST([GNULIB_C32TOB]) GNULIB_MBRTOC32=0; AC_SUBST([GNULIB_MBRTOC32]) GNULIB_MBSNRTOC32S=0; AC_SUBST([GNULIB_MBSNRTOC32S]) GNULIB_MBSRTOC32S=0; AC_SUBST([GNULIB_MBSRTOC32S]) GNULIB_MBSTOC32S=0; AC_SUBST([GNULIB_MBSTOC32S]) dnl Assume proper GNU behavior unless another module says otherwise. + HAVE_C32RTOMB=1; AC_SUBST([HAVE_C32RTOMB]) HAVE_MBRTOC32=1; AC_SUBST([HAVE_MBRTOC32]) + REPLACE_C32RTOMB=0; AC_SUBST([REPLACE_C32RTOMB]) REPLACE_MBRTOC32=0; AC_SUBST([REPLACE_MBRTOC32]) ]) diff --git a/modules/c32rtomb b/modules/c32rtomb new file mode 100644 index 0000000..ea227df --- /dev/null +++ b/modules/c32rtomb @@ -0,0 +1,32 @@ +Description: +c32rtomb() function: convert 32-bit wide character to multibyte character. + +Files: +lib/c32rtomb.c +m4/c32rtomb.m4 +m4/mbrtoc32.m4 + +Depends-on: +uchar +wchar [test $HAVE_C32RTOMB = 0 || test $REPLACE_C32RTOMB = 1] +wcrtomb [test $HAVE_C32RTOMB = 0 || test $REPLACE_C32RTOMB = 1] +localcharset [{ test $HAVE_C32RTOMB = 0 || test $REPLACE_C32RTOMB = 1; } && test $SMALL_WCHAR_T = 1] +streq [{ test $HAVE_C32RTOMB = 0 || test $REPLACE_C32RTOMB = 1; } && test $SMALL_WCHAR_T = 1] + +configure.ac: +gl_FUNC_C32RTOMB +if test $HAVE_C32RTOMB = 0 || test $REPLACE_C32RTOMB = 1; then + AC_LIBOBJ([c32rtomb]) +fi +gl_UCHAR_MODULE_INDICATOR([c32rtomb]) + +Makefile.am: + +Include: +<uchar.h> + +License: +LGPLv2+ + +Maintainer: +Bruno Haible diff --git a/modules/uchar b/modules/uchar index 29bc7ae..cab4518 100644 --- a/modules/uchar +++ b/modules/uchar @@ -29,12 +29,15 @@ uchar.h: uchar.in.h $(top_builddir)/config.status $(CXXDEFS_H) -e 's|@''NEXT_UCHAR_H''@|$(NEXT_UCHAR_H)|g' \ -e 's|@''SMALL_WCHAR_T''@|$(SMALL_WCHAR_T)|g' \ -e 's/@''GNULIB_BTOC32''@/$(GNULIB_BTOC32)/g' \ + -e 's/@''GNULIB_C32RTOMB''@/$(GNULIB_C32RTOMB)/g' \ -e 's/@''GNULIB_C32TOB''@/$(GNULIB_C32TOB)/g' \ -e 's/@''GNULIB_MBRTOC32''@/$(GNULIB_MBRTOC32)/g' \ -e 's/@''GNULIB_MBSNRTOC32S''@/$(GNULIB_MBSNRTOC32S)/g' \ -e 's/@''GNULIB_MBSRTOC32S''@/$(GNULIB_MBSRTOC32S)/g' \ -e 's/@''GNULIB_MBSTOC32S''@/$(GNULIB_MBSTOC32S)/g' \ + -e 's|@''HAVE_C32RTOMB''@|$(HAVE_C32RTOMB)|g' \ -e 's|@''HAVE_MBRTOC32''@|$(HAVE_MBRTOC32)|g' \ + -e 's|@''REPLACE_C32RTOMB''@|$(REPLACE_C32RTOMB)|g' \ -e 's|@''REPLACE_MBRTOC32''@|$(REPLACE_MBRTOC32)|g' \ -e '/definitions of _GL_FUNCDECL_RPL/r $(CXXDEFS_H)' \ < $(srcdir)/uchar.in.h; \ diff --git a/tests/test-uchar-c++.cc b/tests/test-uchar-c++.cc index 3e71c89..ed45da2 100644 --- a/tests/test-uchar-c++.cc +++ b/tests/test-uchar-c++.cc @@ -28,6 +28,11 @@ SIGNATURE_CHECK (GNULIB_NAMESPACE::btoc32, wint_t, (int)); #endif +#if GNULIB_TEST_C32RTOMB +SIGNATURE_CHECK (GNULIB_NAMESPACE::c32rtomb, size_t, + (char *, char32_t , mbstate_t *)); +#endif + #if GNULIB_TEST_C32TOB SIGNATURE_CHECK (GNULIB_NAMESPACE::c32tob, int, (wint_t)); #endif -- 2.7.4
From 18f05ac59765d532823d48c061d0dcac7c55007e Mon Sep 17 00:00:00 2001 From: Bruno Haible <br...@clisp.org> Date: Thu, 9 Jan 2020 02:00:19 +0100 Subject: [PATCH 3/4] c32rtomb: Add tests. * tests/test-c32rtomb.c: New file, based on tests/test-wcrtomb.c. * tests/test-c32rtomb.sh: New file, based on tests/test-wcrtomb.sh. * tests/test-c32rtomb-w32.c: New file, based on tests/test-wcrtomb-w32.c. * tests/test-c32rtomb-w32-1.sh: New file, based on tests/test-wcrtomb-w32-1.sh. * tests/test-c32rtomb-w32-2.sh: New file, based on tests/test-wcrtomb-w32-2.sh. * tests/test-c32rtomb-w32-3.sh: New file, based on tests/test-wcrtomb-w32-3.sh. * tests/test-c32rtomb-w32-4.sh: New file, based on tests/test-wcrtomb-w32-4.sh. * tests/test-c32rtomb-w32-5.sh: New file, based on tests/test-wcrtomb-w32-5.sh. * tests/test-c32rtomb-w32-6.sh: New file, based on tests/test-wcrtomb-w32-6.sh. * tests/test-c32rtomb-w32-7.sh: New file, based on tests/test-wcrtomb-w32-7.sh. * modules/c32rtomb-tests: New file. --- ChangeLog | 21 +++ modules/c32rtomb-tests | 43 ++++++ tests/test-c32rtomb-w32-1.sh | 4 + tests/test-c32rtomb-w32-2.sh | 4 + tests/test-c32rtomb-w32-3.sh | 4 + tests/test-c32rtomb-w32-4.sh | 4 + tests/test-c32rtomb-w32-5.sh | 4 + tests/test-c32rtomb-w32-6.sh | 4 + tests/test-c32rtomb-w32-7.sh | 4 + tests/test-c32rtomb-w32.c | 349 +++++++++++++++++++++++++++++++++++++++++++ tests/test-c32rtomb.c | 170 +++++++++++++++++++++ tests/test-c32rtomb.sh | 39 +++++ 12 files changed, 650 insertions(+) create mode 100644 modules/c32rtomb-tests create mode 100755 tests/test-c32rtomb-w32-1.sh create mode 100755 tests/test-c32rtomb-w32-2.sh create mode 100755 tests/test-c32rtomb-w32-3.sh create mode 100755 tests/test-c32rtomb-w32-4.sh create mode 100755 tests/test-c32rtomb-w32-5.sh create mode 100755 tests/test-c32rtomb-w32-6.sh create mode 100755 tests/test-c32rtomb-w32-7.sh create mode 100644 tests/test-c32rtomb-w32.c create mode 100644 tests/test-c32rtomb.c create mode 100755 tests/test-c32rtomb.sh diff --git a/ChangeLog b/ChangeLog index 3ad99ff..c303d41 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,5 +1,26 @@ 2020-01-08 Bruno Haible <br...@clisp.org> + c32rtomb: Add tests. + * tests/test-c32rtomb.c: New file, based on tests/test-wcrtomb.c. + * tests/test-c32rtomb.sh: New file, based on tests/test-wcrtomb.sh. + * tests/test-c32rtomb-w32.c: New file, based on + tests/test-wcrtomb-w32.c. + * tests/test-c32rtomb-w32-1.sh: New file, based on + tests/test-wcrtomb-w32-1.sh. + * tests/test-c32rtomb-w32-2.sh: New file, based on + tests/test-wcrtomb-w32-2.sh. + * tests/test-c32rtomb-w32-3.sh: New file, based on + tests/test-wcrtomb-w32-3.sh. + * tests/test-c32rtomb-w32-4.sh: New file, based on + tests/test-wcrtomb-w32-4.sh. + * tests/test-c32rtomb-w32-5.sh: New file, based on + tests/test-wcrtomb-w32-5.sh. + * tests/test-c32rtomb-w32-6.sh: New file, based on + tests/test-wcrtomb-w32-6.sh. + * tests/test-c32rtomb-w32-7.sh: New file, based on + tests/test-wcrtomb-w32-7.sh. + * modules/c32rtomb-tests: New file. + c32rtomb: New module. * lib/uchar.in.h (c32rtomb): New declaration. * lib/c32rtomb.c: New file, based on lib/unistr/u8-uctomb-aux.c. diff --git a/modules/c32rtomb-tests b/modules/c32rtomb-tests new file mode 100644 index 0000000..a8d2bee --- /dev/null +++ b/modules/c32rtomb-tests @@ -0,0 +1,43 @@ +Files: +tests/test-c32rtomb.sh +tests/test-c32rtomb.c +tests/test-c32rtomb-w32-1.sh +tests/test-c32rtomb-w32-2.sh +tests/test-c32rtomb-w32-3.sh +tests/test-c32rtomb-w32-4.sh +tests/test-c32rtomb-w32-5.sh +tests/test-c32rtomb-w32-6.sh +tests/test-c32rtomb-w32-7.sh +tests/test-c32rtomb-w32.c +tests/signature.h +tests/macros.h +m4/locale-fr.m4 +m4/locale-ja.m4 +m4/locale-zh.m4 +m4/codeset.m4 + +Depends-on: +btoc32 +mbrtoc32 +setlocale +localcharset + +configure.ac: +gt_LOCALE_FR +gt_LOCALE_FR_UTF8 +gt_LOCALE_JA +gt_LOCALE_ZH_CN + +Makefile.am: +TESTS += \ + test-c32rtomb.sh \ + test-c32rtomb-w32-1.sh test-c32rtomb-w32-2.sh test-c32rtomb-w32-3.sh \ + test-c32rtomb-w32-4.sh test-c32rtomb-w32-5.sh test-c32rtomb-w32-6.sh \ + test-c32rtomb-w32-7.sh +TESTS_ENVIRONMENT += \ + LOCALE_FR='@LOCALE_FR@' \ + LOCALE_FR_UTF8='@LOCALE_FR_UTF8@' \ + LOCALE_JA='@LOCALE_JA@' \ + LOCALE_ZH_CN='@LOCALE_ZH_CN@' +check_PROGRAMS += test-c32rtomb test-c32rtomb-w32 +test_c32rtomb_LDADD = $(LDADD) $(LIB_SETLOCALE) diff --git a/tests/test-c32rtomb-w32-1.sh b/tests/test-c32rtomb-w32-1.sh new file mode 100755 index 0000000..e797d0e --- /dev/null +++ b/tests/test-c32rtomb-w32-1.sh @@ -0,0 +1,4 @@ +#!/bin/sh + +# Test a CP1252 locale. +${CHECKER} ./test-c32rtomb-w32${EXEEXT} French_France 1252 diff --git a/tests/test-c32rtomb-w32-2.sh b/tests/test-c32rtomb-w32-2.sh new file mode 100755 index 0000000..1b63d47 --- /dev/null +++ b/tests/test-c32rtomb-w32-2.sh @@ -0,0 +1,4 @@ +#!/bin/sh + +# Test a CP1256 locale. +${CHECKER} ./test-c32rtomb-w32${EXEEXT} "Arabic_Saudi Arabia" 1256 diff --git a/tests/test-c32rtomb-w32-3.sh b/tests/test-c32rtomb-w32-3.sh new file mode 100755 index 0000000..ff59a87 --- /dev/null +++ b/tests/test-c32rtomb-w32-3.sh @@ -0,0 +1,4 @@ +#!/bin/sh + +# Test a CP932 locale. +${CHECKER} ./test-c32rtomb-w32${EXEEXT} Japanese_Japan 932 diff --git a/tests/test-c32rtomb-w32-4.sh b/tests/test-c32rtomb-w32-4.sh new file mode 100755 index 0000000..3cf3406 --- /dev/null +++ b/tests/test-c32rtomb-w32-4.sh @@ -0,0 +1,4 @@ +#!/bin/sh + +# Test a CP950 locale. +${CHECKER} ./test-c32rtomb-w32${EXEEXT} Chinese_Taiwan 950 diff --git a/tests/test-c32rtomb-w32-5.sh b/tests/test-c32rtomb-w32-5.sh new file mode 100755 index 0000000..2174c0b --- /dev/null +++ b/tests/test-c32rtomb-w32-5.sh @@ -0,0 +1,4 @@ +#!/bin/sh + +# Test a CP936 locale. +${CHECKER} ./test-c32rtomb-w32${EXEEXT} Chinese_China 936 diff --git a/tests/test-c32rtomb-w32-6.sh b/tests/test-c32rtomb-w32-6.sh new file mode 100755 index 0000000..b7e77b2 --- /dev/null +++ b/tests/test-c32rtomb-w32-6.sh @@ -0,0 +1,4 @@ +#!/bin/sh + +# Test a GB18030 locale. +${CHECKER} ./test-c32rtomb-w32${EXEEXT} Chinese_China 54936 diff --git a/tests/test-c32rtomb-w32-7.sh b/tests/test-c32rtomb-w32-7.sh new file mode 100755 index 0000000..3c0f3db --- /dev/null +++ b/tests/test-c32rtomb-w32-7.sh @@ -0,0 +1,4 @@ +#!/bin/sh + +# Test some UTF-8 locales. +${CHECKER} ./test-c32rtomb-w32${EXEEXT} French_France Japanese_Japan Chinese_Taiwan Chinese_China 65001 diff --git a/tests/test-c32rtomb-w32.c b/tests/test-c32rtomb-w32.c new file mode 100644 index 0000000..18630c7 --- /dev/null +++ b/tests/test-c32rtomb-w32.c @@ -0,0 +1,349 @@ +/* Test of conversion of wide character to multibyte character. + Copyright (C) 2008-2020 Free Software Foundation, Inc. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <https://www.gnu.org/licenses/>. */ + +#include <config.h> + +#include <uchar.h> + +#include <locale.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> + +#include "localcharset.h" +#include "macros.h" + +#if defined _WIN32 && !defined __CYGWIN__ + +static int +test_one_locale (const char *name, int codepage) +{ + char buf[64]; + size_t ret; + +# if 1 + /* Portable code to set the locale. */ + { + char name_with_codepage[1024]; + + sprintf (name_with_codepage, "%s.%d", name, codepage); + + /* Set the locale. */ + if (setlocale (LC_ALL, name_with_codepage) == NULL) + return 77; + } +# else + /* Hacky way to set a locale.codepage combination that setlocale() refuses + to set. */ + { + /* Codepage of the current locale, set with setlocale(). + Not necessarily the same as GetACP(). */ + extern __declspec(dllimport) unsigned int __lc_codepage; + + /* Set the locale. */ + if (setlocale (LC_ALL, name) == NULL) + return 77; + + /* Clobber the codepage and MB_CUR_MAX, both set by setlocale(). */ + __lc_codepage = codepage; + switch (codepage) + { + case 1252: + case 1256: + MB_CUR_MAX = 1; + break; + case 932: + case 950: + case 936: + MB_CUR_MAX = 2; + break; + case 54936: + case 65001: + MB_CUR_MAX = 4; + break; + } + + /* Test whether the codepage is really available. */ + { + mbstate_t state; + wchar_t wc; + + memset (&state, '\0', sizeof (mbstate_t)); + if (mbrtowc (&wc, " ", 1, &state) == (size_t)(-1)) + return 77; + } + } +# endif + + /* Test NUL character. */ + { + buf[0] = 'x'; + ret = c32rtomb (buf, 0, NULL); + ASSERT (ret == 1); + ASSERT (buf[0] == '\0'); + } + + /* Test single bytes. */ + { + int c; + + for (c = 0; c < 0x100; c++) + switch (c) + { + case '\t': case '\v': case '\f': + case ' ': case '!': case '"': case '#': case '%': + case '&': case '\'': case '(': case ')': case '*': + case '+': case ',': case '-': case '.': case '/': + case '0': case '1': case '2': case '3': case '4': + case '5': case '6': case '7': case '8': case '9': + case ':': case ';': case '<': case '=': case '>': + case '?': + case 'A': case 'B': case 'C': case 'D': case 'E': + case 'F': case 'G': case 'H': case 'I': case 'J': + case 'K': case 'L': case 'M': case 'N': case 'O': + case 'P': case 'Q': case 'R': case 'S': case 'T': + case 'U': case 'V': case 'W': case 'X': case 'Y': + case 'Z': + case '[': case '\\': case ']': case '^': case '_': + case 'a': case 'b': case 'c': case 'd': case 'e': + case 'f': case 'g': case 'h': case 'i': case 'j': + case 'k': case 'l': case 'm': case 'n': case 'o': + case 'p': case 'q': case 'r': case 's': case 't': + case 'u': case 'v': case 'w': case 'x': case 'y': + case 'z': case '{': case '|': case '}': case '~': + /* c is in the ISO C "basic character set". */ + ret = c32rtomb (buf, btoc32 (c), NULL); + ASSERT (ret == 1); + ASSERT (buf[0] == (char) c); + break; + } + } + + /* Test special calling convention, passing a NULL pointer. */ + { + ret = c32rtomb (NULL, '\0', NULL); + ASSERT (ret == 1); + ret = c32rtomb (NULL, btoc32 ('x'), NULL); + ASSERT (ret == 1); + } + + switch (codepage) + { + case 1252: + /* Locale encoding is CP1252, an extension of ISO-8859-1. */ + { + /* Convert "B\374\337er": "Büßer" */ + memset (buf, 'x', 8); + ret = c32rtomb (buf, 0x00FC, NULL); + ASSERT (ret == 1); + ASSERT (memcmp (buf, "\374", 1) == 0); + ASSERT (buf[1] == 'x'); + + memset (buf, 'x', 8); + ret = c32rtomb (buf, 0x00DF, NULL); + ASSERT (ret == 1); + ASSERT (memcmp (buf, "\337", 1) == 0); + ASSERT (buf[1] == 'x'); + } + return 0; + + case 1256: + /* Locale encoding is CP1256, not the same as ISO-8859-6. */ + { + /* Convert "x\302\341\346y": "xآلوy" */ + memset (buf, 'x', 8); + ret = c32rtomb (buf, 0x0622, NULL); + ASSERT (ret == 1); + ASSERT (memcmp (buf, "\302", 1) == 0); + ASSERT (buf[1] == 'x'); + + memset (buf, 'x', 8); + ret = c32rtomb (buf, 0x0644, NULL); + ASSERT (ret == 1); + ASSERT (memcmp (buf, "\341", 1) == 0); + ASSERT (buf[1] == 'x'); + + memset (buf, 'x', 8); + ret = c32rtomb (buf, 0x0648, NULL); + ASSERT (ret == 1); + ASSERT (memcmp (buf, "\346", 1) == 0); + ASSERT (buf[1] == 'x'); + } + return 0; + + case 932: + /* Locale encoding is CP932, similar to Shift_JIS. */ + { + /* Convert "<\223\372\226\173\214\352>": "<日本語>" */ + memset (buf, 'x', 8); + ret = c32rtomb (buf, 0x65E5, NULL); + ASSERT (ret == 2); + ASSERT (memcmp (buf, "\223\372", 2) == 0); + ASSERT (buf[2] == 'x'); + + memset (buf, 'x', 8); + ret = c32rtomb (buf, 0x672C, NULL); + ASSERT (ret == 2); + ASSERT (memcmp (buf, "\226\173", 2) == 0); + ASSERT (buf[2] == 'x'); + + memset (buf, 'x', 8); + ret = c32rtomb (buf, 0x8A9E, NULL); + ASSERT (ret == 2); + ASSERT (memcmp (buf, "\214\352", 2) == 0); + ASSERT (buf[2] == 'x'); + } + return 0; + + case 950: + /* Locale encoding is CP950, similar to Big5. */ + { + /* Convert "<\244\351\245\273\273\171>": "<日本語>" */ + memset (buf, 'x', 8); + ret = c32rtomb (buf, 0x65E5, NULL); + ASSERT (ret == 2); + ASSERT (memcmp (buf, "\244\351", 2) == 0); + ASSERT (buf[2] == 'x'); + + memset (buf, 'x', 8); + ret = c32rtomb (buf, 0x672C, NULL); + ASSERT (ret == 2); + ASSERT (memcmp (buf, "\245\273", 2) == 0); + ASSERT (buf[2] == 'x'); + + memset (buf, 'x', 8); + ret = c32rtomb (buf, 0x8A9E, NULL); + ASSERT (ret == 2); + ASSERT (memcmp (buf, "\273\171", 2) == 0); + ASSERT (buf[2] == 'x'); + } + return 0; + + case 936: + /* Locale encoding is CP936 = GBK, an extension of GB2312. */ + { + /* Convert "<\310\325\261\276\325\132>": "<日本語>" */ + memset (buf, 'x', 8); + ret = c32rtomb (buf, 0x65E5, NULL); + ASSERT (ret == 2); + ASSERT (memcmp (buf, "\310\325", 2) == 0); + ASSERT (buf[2] == 'x'); + + memset (buf, 'x', 8); + ret = c32rtomb (buf, 0x672C, NULL); + ASSERT (ret == 2); + ASSERT (memcmp (buf, "\261\276", 2) == 0); + ASSERT (buf[2] == 'x'); + + memset (buf, 'x', 8); + ret = c32rtomb (buf, 0x8A9E, NULL); + ASSERT (ret == 2); + ASSERT (memcmp (buf, "\325\132", 2) == 0); + ASSERT (buf[2] == 'x'); + } + return 0; + + case 54936: + /* Locale encoding is CP54936 = GB18030. */ + if (strcmp (locale_charset (), "GB18030") != 0) + return 77; + { + /* Convert "s\250\271\201\060\211\070\224\071\375\067!"; "süß😋!" */ + memset (buf, 'x', 8); + ret = c32rtomb (buf, 0x00FC, NULL); + ASSERT (ret == 2); + ASSERT (memcmp (buf, "\250\271", 2) == 0); + ASSERT (buf[2] == 'x'); + + memset (buf, 'x', 8); + ret = c32rtomb (buf, 0x00DF, NULL); + ASSERT (ret == 4); + ASSERT (memcmp (buf, "\201\060\211\070", 4) == 0); + ASSERT (buf[4] == 'x'); + + memset (buf, 'x', 8); + ret = c32rtomb (buf, 0x1F60B, NULL); + ASSERT (ret == 4); + ASSERT (memcmp (buf, "\224\071\375\067", 4) == 0); + ASSERT (buf[4] == 'x'); + } + return 0; + + case 65001: + /* Locale encoding is CP65001 = UTF-8. */ + if (strcmp (locale_charset (), "UTF-8") != 0) + return 77; + { + /* Convert "s\303\274\303\237\360\237\230\213!"; "süß😋!" */ + memset (buf, 'x', 8); + ret = c32rtomb (buf, 0x00FC, NULL); + ASSERT (ret == 2); + ASSERT (memcmp (buf, "\303\274", 2) == 0); + ASSERT (buf[2] == 'x'); + + memset (buf, 'x', 8); + ret = c32rtomb (buf, 0x00DF, NULL); + ASSERT (ret == 2); + ASSERT (memcmp (buf, "\303\237", 2) == 0); + ASSERT (buf[2] == 'x'); + + memset (buf, 'x', 8); + ret = c32rtomb (buf, 0x1F60B, NULL); + ASSERT (ret == 4); + ASSERT (memcmp (buf, "\360\237\230\213", 4) == 0); + ASSERT (buf[4] == 'x'); + } + return 0; + + default: + return 1; + } +} + +int +main (int argc, char *argv[]) +{ + int codepage = atoi (argv[argc - 1]); + int result; + int i; + + result = 77; + for (i = 1; i < argc - 1; i++) + { + int ret = test_one_locale (argv[i], codepage); + + if (ret != 77) + result = ret; + } + + if (result == 77) + { + fprintf (stderr, "Skipping test: found no locale with codepage %d\n", + codepage); + } + return result; +} + +#else + +int +main (int argc, char *argv[]) +{ + fputs ("Skipping test: not a native Windows system\n", stderr); + return 77; +} + +#endif diff --git a/tests/test-c32rtomb.c b/tests/test-c32rtomb.c new file mode 100644 index 0000000..108efe3 --- /dev/null +++ b/tests/test-c32rtomb.c @@ -0,0 +1,170 @@ +/* Test of conversion of wide character to multibyte character. + Copyright (C) 2008-2020 Free Software Foundation, Inc. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <https://www.gnu.org/licenses/>. */ + +/* Written by Bruno Haible <br...@clisp.org>, 2008. */ + +#include <config.h> + +#include <uchar.h> + +#include "signature.h" +SIGNATURE_CHECK (c32rtomb, size_t, (char *, char32_t, mbstate_t *)); + +#include <locale.h> +#include <stdlib.h> +#include <string.h> + +#include "macros.h" + +/* Check the multibyte character s[0..n-1]. */ +static void +check_character (const char *s, size_t n) +{ + mbstate_t state; + char32_t wc; + char buf[64]; + int iret; + size_t ret; + + memset (&state, '\0', sizeof (mbstate_t)); + wc = (char32_t) 0xBADFACE; + iret = mbrtoc32 (&wc, s, n, &state); + ASSERT (iret == n); + + ret = c32rtomb (buf, wc, NULL); + ASSERT (ret == n); + ASSERT (memcmp (buf, s, n) == 0); + + /* Test special calling convention, passing a NULL pointer. */ + ret = c32rtomb (NULL, wc, NULL); + ASSERT (ret == 1); +} + +int +main (int argc, char *argv[]) +{ + char buf[64]; + size_t ret; + + /* configure should already have checked that the locale is supported. */ + if (setlocale (LC_ALL, "") == NULL) + return 1; + + /* Test NUL character. */ + { + buf[0] = 'x'; + ret = c32rtomb (buf, 0, NULL); + ASSERT (ret == 1); + ASSERT (buf[0] == '\0'); + } + + /* Test single bytes. */ + { + int c; + + for (c = 0; c < 0x100; c++) + switch (c) + { + case '\t': case '\v': case '\f': + case ' ': case '!': case '"': case '#': case '%': + case '&': case '\'': case '(': case ')': case '*': + case '+': case ',': case '-': case '.': case '/': + case '0': case '1': case '2': case '3': case '4': + case '5': case '6': case '7': case '8': case '9': + case ':': case ';': case '<': case '=': case '>': + case '?': + case 'A': case 'B': case 'C': case 'D': case 'E': + case 'F': case 'G': case 'H': case 'I': case 'J': + case 'K': case 'L': case 'M': case 'N': case 'O': + case 'P': case 'Q': case 'R': case 'S': case 'T': + case 'U': case 'V': case 'W': case 'X': case 'Y': + case 'Z': + case '[': case '\\': case ']': case '^': case '_': + case 'a': case 'b': case 'c': case 'd': case 'e': + case 'f': case 'g': case 'h': case 'i': case 'j': + case 'k': case 'l': case 'm': case 'n': case 'o': + case 'p': case 'q': case 'r': case 's': case 't': + case 'u': case 'v': case 'w': case 'x': case 'y': + case 'z': case '{': case '|': case '}': case '~': + /* c is in the ISO C "basic character set". */ + ret = c32rtomb (buf, btoc32 (c), NULL); + ASSERT (ret == 1); + ASSERT (buf[0] == (char) c); + break; + } + } + + /* Test special calling convention, passing a NULL pointer. */ + { + ret = c32rtomb (NULL, '\0', NULL); + ASSERT (ret == 1); + ret = c32rtomb (NULL, btoc32 ('x'), NULL); + ASSERT (ret == 1); + } + + if (argc > 1) + switch (argv[1][0]) + { + case '1': + /* Locale encoding is ISO-8859-1 or ISO-8859-15. */ + { + const char input[] = "B\374\337er"; /* "Büßer" */ + + check_character (input + 1, 1); + check_character (input + 2, 1); + } + return 0; + + case '2': + /* Locale encoding is UTF-8. */ + { + const char input[] = "s\303\274\303\237\360\237\230\213!"; /* "süß😋!" */ + + check_character (input + 1, 2); + check_character (input + 3, 2); + check_character (input + 5, 4); + } + return 0; + + case '3': + /* Locale encoding is EUC-JP. */ + { + const char input[] = "<\306\374\313\334\270\354>"; /* "<日本語>" */ + + check_character (input + 1, 2); + check_character (input + 3, 2); + check_character (input + 5, 2); + } + return 0; + + case '4': + /* Locale encoding is GB18030. */ + { + const char input[] = "s\250\271\201\060\211\070\224\071\375\067!"; /* "süß😋!" */ + + check_character (input + 1, 2); + check_character (input + 3, 4); + check_character (input + 7, 4); + } + return 0; + + case '5': + /* C locale; tested above. */ + return 0; + } + + return 1; +} diff --git a/tests/test-c32rtomb.sh b/tests/test-c32rtomb.sh new file mode 100755 index 0000000..2899297 --- /dev/null +++ b/tests/test-c32rtomb.sh @@ -0,0 +1,39 @@ +#!/bin/sh + +# Test in an ISO-8859-1 or ISO-8859-15 locale. +: ${LOCALE_FR=fr_FR} +if test $LOCALE_FR != none; then + LC_ALL=$LOCALE_FR \ + ${CHECKER} ./test-c32rtomb${EXEEXT} 1 \ + || exit 1 +fi + +# Test whether a specific UTF-8 locale is installed. +: ${LOCALE_FR_UTF8=fr_FR.UTF-8} +if test $LOCALE_FR_UTF8 != none; then + LC_ALL=$LOCALE_FR_UTF8 \ + ${CHECKER} ./test-c32rtomb${EXEEXT} 2 \ + || exit 1 +fi + +# Test whether a specific EUC-JP locale is installed. +: ${LOCALE_JA=ja_JP} +if test $LOCALE_JA != none; then + LC_ALL=$LOCALE_JA \ + ${CHECKER} ./test-c32rtomb${EXEEXT} 3 \ + || exit 1 +fi + +# Test whether a specific GB18030 locale is installed. +: ${LOCALE_ZH_CN=zh_CN.GB18030} +if test $LOCALE_ZH_CN != none; then + LC_ALL=$LOCALE_ZH_CN \ + ${CHECKER} ./test-c32rtomb${EXEEXT} 4 \ + || exit 1 +fi + +# Test in the POSIX locale. +LC_ALL=C ${CHECKER} ./test-c32rtomb${EXEEXT} 5 || exit 1 +LC_ALL=POSIX ${CHECKER} ./test-c32rtomb${EXEEXT} 5 || exit 1 + +exit 0 -- 2.7.4
>From d6f8671505956401691e3c35d19499470f582a88 Mon Sep 17 00:00:00 2001 From: Bruno Haible <br...@clisp.org> Date: Thu, 9 Jan 2020 02:04:07 +0100 Subject: [PATCH 4/4] c32tob: Make consistent with mbrtoc32. * lib/c32tob.c: Include <stdio.h>, <string.h>, <wchar.h>. (c32tob): If the char32_t encoding and the wchar_t encoding may differ, use c32rtomb, not wctob. * modules/c32tob (Files): Add m4/mbrtoc32.m4. (Depends-on): Add c32rtomb. (configure.ac): Require gl_MBRTOC32_SANITYCHECK. --- ChangeLog | 10 ++++++++++ lib/c32tob.c | 19 ++++++++++++++++++- modules/c32tob | 3 +++ 3 files changed, 31 insertions(+), 1 deletion(-) diff --git a/ChangeLog b/ChangeLog index c303d41..9c3f603 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,5 +1,15 @@ 2020-01-08 Bruno Haible <br...@clisp.org> + c32tob: Make consistent with mbrtoc32. + * lib/c32tob.c: Include <stdio.h>, <string.h>, <wchar.h>. + (c32tob): If the char32_t encoding and the wchar_t encoding may differ, + use c32rtomb, not wctob. + * modules/c32tob (Files): Add m4/mbrtoc32.m4. + (Depends-on): Add c32rtomb. + (configure.ac): Require gl_MBRTOC32_SANITYCHECK. + +2020-01-08 Bruno Haible <br...@clisp.org> + c32rtomb: Add tests. * tests/test-c32rtomb.c: New file, based on tests/test-wcrtomb.c. * tests/test-c32rtomb.sh: New file, based on tests/test-wcrtomb.sh. diff --git a/lib/c32tob.c b/lib/c32tob.c index 4da438f..55f61c7 100644 --- a/lib/c32tob.c +++ b/lib/c32tob.c @@ -21,10 +21,27 @@ /* Specification. */ #include <uchar.h> +#include <stdio.h> +#include <string.h> +#include <wchar.h> + int c32tob (wint_t wc) { -#if _GL_LARGE_CHAR32_T +#if HAVE_WORKING_MBRTOC32 && !defined __GLIBC__ + /* The char32_t encoding of a multibyte character may be different than its + wchar_t encoding. */ + if (wc != WEOF) + { + mbstate_t state; + char buf[8]; + + memset (&state, '\0', sizeof (mbstate_t)); + if (c32rtomb (buf, wc, &state) == 1) + return (unsigned char) buf[0]; + } + return EOF; +#elif _GL_LARGE_CHAR32_T /* In all known encodings, unibyte characters correspond only to characters in the BMP. */ if (wc != WEOF && (wchar_t) wc == wc) diff --git a/modules/c32tob b/modules/c32tob index 3ef42ba..42e18a9 100644 --- a/modules/c32tob +++ b/modules/c32tob @@ -3,12 +3,15 @@ c32tob() function: convert 32-bit wide character to unibyte character. Files: lib/c32tob.c +m4/mbrtoc32.m4 Depends-on: uchar +c32rtomb wctob configure.ac: +AC_REQUIRE([gl_MBRTOC32_SANITYCHECK]) gl_UCHAR_MODULE_INDICATOR([c32tob]) Makefile.am: -- 2.7.4