Ludovic Courtès wrote: > the excerpt of `u16-conv-from-enc.c' > that I quoted made me think that, e.g., "UTF-16BE" and "UTF-16LE" were > only known to work on Glibc >= 2.2: > > /* Name of UTF-16 encoding with machine dependent endianness and alignment. > */ > #if defined _LIBICONV_VERSION || (__GLIBC__ > 2) || (__GLIBC__ == 2 && > __GLIBC_MINOR__ >= 2) > # ifdef WORDS_BIGENDIAN > # define UTF16_NAME "UTF-16BE" > # else > # define UTF16_NAME "UTF-16LE" > # endif > #endif > > Likewise, `u-conv-from-enc.h' contains alternate code for systems where > `UTF_NAME' is undefined (i.e., typically on non-GNU systems).
Indeed. The support of UTF-16BE/LE in non-GNU iconv implementation is not good: - On Solaris >= 9, UTF-{16,32}{BE,LE} are fully supported. - On AIX 5.1, only UCS-2 is recognized, and it is actually UCS-2BE. - On AIX 5.2, only UCS-2, UTF-16, UTF-32 are recognized, and they are actually UCS-2BE, UTF-16BE, UTF-32BE. - On IRIX 6.5, only UCS-2, UTF-16 are recognized, and they are actually UCS-2BE, UTF-16BE. - On HP-UX 11, only the names ucs2, ucs4 are recognized (lowercase! no dash!), and they are actually UCS-2BE, UCS-4BE. - On OSF/1 4.0, no UTF conversions are supported at all, not even UTF-8. - On OSF/1 5.1, all of UCS-2, UTF-16, UTF-32, UTF-16BE, UTF-16LE, UTF-32LE, UTF-32BE are recognized, but the latter behave incorrectly since they add a BOM. - On NetBSD 3.0, in conversion from UTF-16BE to UTF-8, the iconv() function returns nonsense values. > Therefore, `mem_iconveh ()' doesn't seem appropriate since there is > AFAIUI no portable way to specify, say, "UTF-16{LE,BE}" as TO_CODESET... > which led me to suggest that we might have to provide endianness-aware > conversion procedures. > > Does it clarify things a bit? :-) Yes, it is clear what you mean. But adding a new API for a thing that mem_cd_iconveh should be able to do, just because there is a portability problem, is against gnulib's general approach. In gnulib we solve this by fixing the portability problem, not by adding new API. I'm adding a module 'iconv_open-utf' that enhances iconv_open() so that it supports conversion between UTF-8 and UTF-{16,32}{BE,LE}. Except on platform where iconv is entirely absent or unusable (such as OSF/1 4.0), this module allows you to use the mem_iconveh() function for doing conversion from an arbitrary encoding to UTF-{16,32}{BE,LE} or vice versa. 2007-10-14 Bruno Haible <[EMAIL PROTECTED]> Enhance iconv_open to support UTF-16BE, UTF-16LE, UTF-32BE, UTF-32LE. * modules/iconv_open-utf: New file. * lib/iconv.in.h (_ICONV_UTF8_UTF*, _ICONV_UTF*_UTF8): New macros. (iconv, iconv_close): New declarations. * lib/iconv_open.c: Include c-strcase.h. Don't require ICONV_FLAVOR to be defined. (iconv_open): Add special handling of conversion between UTF-8 and UTF-{16,32}{BE,LE}. * lib/iconv.c: New file, incorporating code from GNU libiconv 1.11. * lib/iconv_close.c: New file. * m4/iconv_open.m4 (gl_REPLACE_ICONV_OPEN): New macro, extracted from gl_FUNC_ICONV_OPEN. (gl_FUNC_ICONV_OPEN): Use it. (gl_FUNC_ICONV_OPEN_UTF): New macro. * m4/iconv_h.m4 (gl_ICONV_H_DEFAULTS): Initialize also REPLACE_ICONV and REPLACE_ICONV_UTF. * modules/iconv_open (Depends-on): Add c-strcase. (Makefile.am): Substitute also REPLACE_ICONV, REPLACE_ICONV_UTF, ICONV_CONST. * doc/functions/iconv_open.texi: Mention the iconv_open-utf module. *** modules/iconv_open-utf.orig 2003-09-23 19:59:22.000000000 +0200 --- modules/iconv_open-utf 2007-10-14 02:59:34.000000000 +0200 *************** *** 0 **** --- 1,29 ---- + Description: + Character set conversion support for UTF-{16,32}{BE,LE} encodings. + + Files: + lib/iconv.c + lib/iconv_close.c + m4/iconv_open.m4 + + Depends-on: + iconv_open + stdint + unistr/u8-mbtoucr + unistr/u8-uctomb + + configure.ac: + gl_FUNC_ICONV_OPEN_UTF + + Makefile.am: + + Include: + + Link: + + License: + LGPL + + Maintainer: + Bruno Haible + *** lib/iconv.in.h.orig 2007-10-14 12:14:52.000000000 +0200 --- lib/iconv.in.h 2007-10-13 23:35:27.000000000 +0200 *************** *** 36,41 **** --- 36,63 ---- extern iconv_t iconv_open (const char *tocode, const char *fromcode); #endif + #if @REPLACE_ICONV_UTF@ + /* Special constants for supporting UTF-{16,32}{BE,LE} encodings. + Not public. */ + # define _ICONV_UTF8_UTF16BE (iconv_t)(-161) + # define _ICONV_UTF8_UTF16LE (iconv_t)(-162) + # define _ICONV_UTF8_UTF32BE (iconv_t)(-163) + # define _ICONV_UTF8_UTF32LE (iconv_t)(-164) + # define _ICONV_UTF16BE_UTF8 (iconv_t)(-165) + # define _ICONV_UTF16LE_UTF8 (iconv_t)(-166) + # define _ICONV_UTF32BE_UTF8 (iconv_t)(-167) + # define _ICONV_UTF32LE_UTF8 (iconv_t)(-168) + #endif + + #if @REPLACE_ICONV@ + # define iconv rpl_iconv + extern size_t iconv (iconv_t cd, + @ICONV_CONST@ char **inbuf, size_t *inbytesleft, + char **outbuf, size_t *outbytesleft); + # define iconv_close rpl_iconv_close + extern int iconv_close (iconv_t cd); + #endif + #ifdef __cplusplus } *** lib/iconv_open.c.orig 2007-10-14 12:14:52.000000000 +0200 --- lib/iconv_open.c 2007-10-14 11:54:54.000000000 +0200 *************** *** 23,42 **** #include <errno.h> #include <string.h> #include "c-ctype.h" #define SIZEOF(a) (sizeof(a) / sizeof(a[0])) /* Namespace cleanliness. */ #define mapping_lookup rpl_iconv_open_mapping_lookup ! /* The macro ICONV_FLAVOR is defined to one of these. */ #define ICONV_FLAVOR_AIX "iconv_open-aix.h" #define ICONV_FLAVOR_HPUX "iconv_open-hpux.h" #define ICONV_FLAVOR_IRIX "iconv_open-irix.h" #define ICONV_FLAVOR_OSF "iconv_open-osf.h" ! #include ICONV_FLAVOR iconv_t rpl_iconv_open (const char *tocode, const char *fromcode) --- 23,45 ---- #include <errno.h> #include <string.h> #include "c-ctype.h" + #include "c-strcase.h" #define SIZEOF(a) (sizeof(a) / sizeof(a[0])) /* Namespace cleanliness. */ #define mapping_lookup rpl_iconv_open_mapping_lookup ! /* The macro ICONV_FLAVOR is defined to one of these or undefined. */ #define ICONV_FLAVOR_AIX "iconv_open-aix.h" #define ICONV_FLAVOR_HPUX "iconv_open-hpux.h" #define ICONV_FLAVOR_IRIX "iconv_open-irix.h" #define ICONV_FLAVOR_OSF "iconv_open-osf.h" ! #ifdef ICONV_FLAVOR ! # include ICONV_FLAVOR ! #endif iconv_t rpl_iconv_open (const char *tocode, const char *fromcode) *************** *** 47,52 **** --- 50,108 ---- char *fromcode_upper_end; char *tocode_upper_end; + #if REPLACE_ICONV_UTF + /* Special handling of conversion between UTF-8 and UTF-{16,32}{BE,LE}. + Do this here, before calling the real iconv_open(), because OSF/1 5.1 + iconv() to these encoding inserts a BOM, which is wrong. + We do not need to handle conversion between arbitrary encodings and + UTF-{16,32}{BE,LE}, because the 'striconveh' module implements two-step + conversion throough UTF-8. + The _ICONV_* constants are chosen to be disjoint from any iconv_t + returned by the system's iconv_open() functions. Recall that iconv_t + is a scalar type. */ + if (c_toupper (fromcode[0]) == 'U' + && c_toupper (fromcode[1]) == 'T' + && c_toupper (fromcode[2]) == 'F' + && fromcode[3] == '-') + { + if (c_toupper (tocode[0]) == 'U' + && c_toupper (tocode[1]) == 'T' + && c_toupper (tocode[2]) == 'F' + && tocode[3] == '-') + { + if (strcmp (fromcode + 4, "8") == 0) + { + if (c_strcasecmp (tocode + 4, "16BE") == 0) + return _ICONV_UTF8_UTF16BE; + if (c_strcasecmp (tocode + 4, "16LE") == 0) + return _ICONV_UTF8_UTF16LE; + if (c_strcasecmp (tocode + 4, "32BE") == 0) + return _ICONV_UTF8_UTF32BE; + if (c_strcasecmp (tocode + 4, "32LE") == 0) + return _ICONV_UTF8_UTF32LE; + } + else if (strcmp (tocode + 4, "8") == 0) + { + if (c_strcasecmp (fromcode + 4, "16BE") == 0) + return _ICONV_UTF16BE_UTF8; + if (c_strcasecmp (fromcode + 4, "16LE") == 0) + return _ICONV_UTF16LE_UTF8; + if (c_strcasecmp (fromcode + 4, "32BE") == 0) + return _ICONV_UTF32BE_UTF8; + if (c_strcasecmp (fromcode + 4, "32LE") == 0) + return _ICONV_UTF32LE_UTF8; + } + } + } + #endif + + /* Do *not* add special support for 8-bit encodings like ASCII or ISO-8859-1 + here. This would lead to programs that work in some locales (such as the + "C" or "en_US" locales) but do not work in East Asian locales. It is + better if programmers make their programs depend on GNU libiconv (except + on glibc systems), e.g. by using the AM_ICONV macro and documenting the + dependency in an INSTALL or DEPENDENCIES file. */ + /* Try with the original names first. This covers the case when fromcode or tocode is a lowercase encoding name that is understood by the system's iconv_open but not listed in our *************** *** 93,98 **** --- 149,155 ---- tocode_upper_end = q; } + #ifdef ICONV_FLAVOR /* Apply the mappings. */ { const struct mapping *m = *************** *** 106,111 **** --- 163,172 ---- tocode = (m != NULL ? m->vendor_name : tocode_upper); } + #else + fromcode = fromcode_upper; + tocode = tocode_upper; + #endif return iconv_open (tocode, fromcode); } *** lib/iconv.c.orig 2003-09-23 19:59:22.000000000 +0200 --- lib/iconv.c 2007-10-14 11:06:26.000000000 +0200 *************** *** 0 **** --- 1,450 ---- + /* Character set conversion. + Copyright (C) 1999-2001, 2007 Free Software Foundation, Inc. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 2, or (at your option) + any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License along + with this program; if not, write to the Free Software Foundation, + Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. */ + + #include <config.h> + + /* Specification. */ + #include <iconv.h> + + #include <stddef.h> + + #if REPLACE_ICONV_UTF + # include <errno.h> + # include <stdint.h> + # include <stdlib.h> + # include "unistr.h" + # ifndef uintptr_t + # define uintptr_t unsigned long + # endif + #endif + + #if REPLACE_ICONV_UTF + + /* UTF-{16,32}{BE,LE} converters taken from GNU libiconv 1.11. */ + + /* Return code if invalid. (xxx_mbtowc) */ + # define RET_ILSEQ -1 + /* Return code if no bytes were read. (xxx_mbtowc) */ + # define RET_TOOFEW -2 + + /* Return code if invalid. (xxx_wctomb) */ + # define RET_ILUNI -1 + /* Return code if output buffer is too small. (xxx_wctomb, xxx_reset) */ + # define RET_TOOSMALL -2 + + /* + * UTF-16BE + */ + + /* Specification: RFC 2781 */ + + static int + utf16be_mbtowc (ucs4_t *pwc, const unsigned char *s, size_t n) + { + if (n >= 2) + { + ucs4_t wc = (s[0] << 8) + s[1]; + if (wc >= 0xd800 && wc < 0xdc00) + { + if (n >= 4) + { + ucs4_t wc2 = (s[2] << 8) + s[3]; + if (!(wc2 >= 0xdc00 && wc2 < 0xe000)) + return RET_ILSEQ; + *pwc = 0x10000 + ((wc - 0xd800) << 10) + (wc2 - 0xdc00); + return 4; + } + } + else if (wc >= 0xdc00 && wc < 0xe000) + { + return RET_ILSEQ; + } + else + { + *pwc = wc; + return 2; + } + } + return RET_TOOFEW; + } + + static int + utf16be_wctomb (unsigned char *r, ucs4_t wc, size_t n) + { + if (!(wc >= 0xd800 && wc < 0xe000)) + { + if (wc < 0x10000) + { + if (n >= 2) + { + r[0] = (unsigned char) (wc >> 8); + r[1] = (unsigned char) wc; + return 2; + } + else + return RET_TOOSMALL; + } + else if (wc < 0x110000) + { + if (n >= 4) + { + ucs4_t wc1 = 0xd800 + ((wc - 0x10000) >> 10); + ucs4_t wc2 = 0xdc00 + ((wc - 0x10000) & 0x3ff); + r[0] = (unsigned char) (wc1 >> 8); + r[1] = (unsigned char) wc1; + r[2] = (unsigned char) (wc2 >> 8); + r[3] = (unsigned char) wc2; + return 4; + } + else + return RET_TOOSMALL; + } + } + return RET_ILUNI; + } + + /* + * UTF-16LE + */ + + /* Specification: RFC 2781 */ + + static int + utf16le_mbtowc (ucs4_t *pwc, const unsigned char *s, size_t n) + { + if (n >= 2) + { + ucs4_t wc = s[0] + (s[1] << 8); + if (wc >= 0xd800 && wc < 0xdc00) + { + if (n >= 4) + { + ucs4_t wc2 = s[2] + (s[3] << 8); + if (!(wc2 >= 0xdc00 && wc2 < 0xe000)) + return RET_ILSEQ; + *pwc = 0x10000 + ((wc - 0xd800) << 10) + (wc2 - 0xdc00); + return 4; + } + } + else if (wc >= 0xdc00 && wc < 0xe000) + { + return RET_ILSEQ; + } + else + { + *pwc = wc; + return 2; + } + } + return RET_TOOFEW; + } + + static int + utf16le_wctomb (unsigned char *r, ucs4_t wc, size_t n) + { + if (!(wc >= 0xd800 && wc < 0xe000)) + { + if (wc < 0x10000) + { + if (n >= 2) + { + r[0] = (unsigned char) wc; + r[1] = (unsigned char) (wc >> 8); + return 2; + } + else + return RET_TOOSMALL; + } + else if (wc < 0x110000) + { + if (n >= 4) + { + ucs4_t wc1 = 0xd800 + ((wc - 0x10000) >> 10); + ucs4_t wc2 = 0xdc00 + ((wc - 0x10000) & 0x3ff); + r[0] = (unsigned char) wc1; + r[1] = (unsigned char) (wc1 >> 8); + r[2] = (unsigned char) wc2; + r[3] = (unsigned char) (wc2 >> 8); + return 4; + } + else + return RET_TOOSMALL; + } + } + return RET_ILUNI; + } + + /* + * UTF-32BE + */ + + /* Specification: Unicode 3.1 Standard Annex #19 */ + + static int + utf32be_mbtowc (ucs4_t *pwc, const unsigned char *s, size_t n) + { + if (n >= 4) + { + ucs4_t wc = (s[0] << 24) + (s[1] << 16) + (s[2] << 8) + s[3]; + if (wc < 0x110000 && !(wc >= 0xd800 && wc < 0xe000)) + { + *pwc = wc; + return 4; + } + else + return RET_ILSEQ; + } + return RET_TOOFEW; + } + + static int + utf32be_wctomb (unsigned char *r, ucs4_t wc, size_t n) + { + if (wc < 0x110000 && !(wc >= 0xd800 && wc < 0xe000)) + { + if (n >= 4) + { + r[0] = 0; + r[1] = (unsigned char) (wc >> 16); + r[2] = (unsigned char) (wc >> 8); + r[3] = (unsigned char) wc; + return 4; + } + else + return RET_TOOSMALL; + } + return RET_ILUNI; + } + + /* + * UTF-32LE + */ + + /* Specification: Unicode 3.1 Standard Annex #19 */ + + static int + utf32le_mbtowc (ucs4_t *pwc, const unsigned char *s, size_t n) + { + if (n >= 4) + { + ucs4_t wc = s[0] + (s[1] << 8) + (s[2] << 16) + (s[3] << 24); + if (wc < 0x110000 && !(wc >= 0xd800 && wc < 0xe000)) + { + *pwc = wc; + return 4; + } + else + return RET_ILSEQ; + } + return RET_TOOFEW; + } + + static int + utf32le_wctomb (unsigned char *r, ucs4_t wc, size_t n) + { + if (wc < 0x110000 && !(wc >= 0xd800 && wc < 0xe000)) + { + if (n >= 4) + { + r[0] = (unsigned char) wc; + r[1] = (unsigned char) (wc >> 8); + r[2] = (unsigned char) (wc >> 16); + r[3] = 0; + return 4; + } + else + return RET_TOOSMALL; + } + return RET_ILUNI; + } + + #endif + + size_t + iconv (iconv_t cd, + ICONV_CONST char **inbuf, size_t *inbytesleft, + char **outbuf, size_t *outbytesleft) + #undef iconv + { + #if REPLACE_ICONV_UTF + switch ((uintptr_t) cd) + { + { + int (*xxx_wctomb) (unsigned char *, ucs4_t, size_t); + + case (uintptr_t) _ICONV_UTF8_UTF16BE: + xxx_wctomb = utf16be_wctomb; + goto loop_from_utf8; + case (uintptr_t) _ICONV_UTF8_UTF16LE: + xxx_wctomb = utf16le_wctomb; + goto loop_from_utf8; + case (uintptr_t) _ICONV_UTF8_UTF32BE: + xxx_wctomb = utf32be_wctomb; + goto loop_from_utf8; + case (uintptr_t) _ICONV_UTF8_UTF32LE: + xxx_wctomb = utf32le_wctomb; + goto loop_from_utf8; + + loop_from_utf8: + if (inbuf == NULL || *inbuf == NULL) + return 0; + { + ICONV_CONST char *inptr = *inbuf; + size_t inleft = *inbytesleft; + char *outptr = *outbuf; + size_t outleft = *outbytesleft; + size_t res = 0; + while (inleft > 0) + { + ucs4_t uc; + int m = u8_mbtoucr (&uc, (const uint8_t *) inptr, inleft); + if (m <= 0) + { + if (m == -1) + { + errno = EILSEQ; + res = (size_t)(-1); + break; + } + if (m == -2) + { + errno = EINVAL; + res = (size_t)(-1); + break; + } + abort (); + } + else + { + int n = xxx_wctomb ((uint8_t *) outptr, uc, outleft); + if (n < 0) + { + if (n == RET_ILUNI) + { + errno = EILSEQ; + res = (size_t)(-1); + break; + } + if (n == RET_TOOSMALL) + { + errno = E2BIG; + res = (size_t)(-1); + break; + } + abort (); + } + else + { + inptr += m; + inleft -= m; + outptr += n; + outleft -= n; + } + } + } + *inbuf = inptr; + *inbytesleft = inleft; + *outbuf = outptr; + *outbytesleft = outleft; + return res; + } + } + + { + int (*xxx_mbtowc) (ucs4_t *, const unsigned char *, size_t); + + case (uintptr_t) _ICONV_UTF16BE_UTF8: + xxx_mbtowc = utf16be_mbtowc; + goto loop_to_utf8; + case (uintptr_t) _ICONV_UTF16LE_UTF8: + xxx_mbtowc = utf16le_mbtowc; + goto loop_to_utf8; + case (uintptr_t) _ICONV_UTF32BE_UTF8: + xxx_mbtowc = utf32be_mbtowc; + goto loop_to_utf8; + case (uintptr_t) _ICONV_UTF32LE_UTF8: + xxx_mbtowc = utf32le_mbtowc; + goto loop_to_utf8; + + loop_to_utf8: + if (inbuf == NULL || *inbuf == NULL) + return 0; + { + ICONV_CONST char *inptr = *inbuf; + size_t inleft = *inbytesleft; + char *outptr = *outbuf; + size_t outleft = *outbytesleft; + size_t res = 0; + while (inleft > 0) + { + ucs4_t uc; + int m = xxx_mbtowc (&uc, (const uint8_t *) inptr, inleft); + if (m <= 0) + { + if (m == RET_ILSEQ) + { + errno = EILSEQ; + res = (size_t)(-1); + break; + } + if (m == RET_TOOFEW) + { + errno = EINVAL; + res = (size_t)(-1); + break; + } + abort (); + } + else + { + int n = u8_uctomb ((uint8_t *) outptr, uc, outleft); + if (n < 0) + { + if (n == -1) + { + errno = EILSEQ; + res = (size_t)(-1); + break; + } + if (n == -2) + { + errno = E2BIG; + res = (size_t)(-1); + break; + } + abort (); + } + else + { + inptr += m; + inleft -= m; + outptr += n; + outleft -= n; + } + } + } + *inbuf = inptr; + *inbytesleft = inleft; + *outbuf = outptr; + *outbytesleft = outleft; + return res; + } + } + } + #endif + return iconv (cd, inbuf, inbytesleft, outbuf, outbytesleft); + } *** lib/iconv_close.c.orig 2003-09-23 19:59:22.000000000 +0200 --- lib/iconv_close.c 2007-10-14 00:01:42.000000000 +0200 *************** *** 0 **** --- 1,47 ---- + /* Character set conversion. + Copyright (C) 2007 Free Software Foundation, Inc. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 2, or (at your option) + any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License along + with this program; if not, write to the Free Software Foundation, + Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. */ + + #include <config.h> + + /* Specification. */ + #include <iconv.h> + + #include <stdint.h> + #ifndef uintptr_t + # define uintptr_t unsigned long + #endif + + int + iconv_close (iconv_t cd) + #undef iconv_close + { + #if REPLACE_ICONV_UTF + switch ((uintptr_t) cd) + { + case (uintptr_t) _ICONV_UTF8_UTF16BE: + case (uintptr_t) _ICONV_UTF8_UTF16LE: + case (uintptr_t) _ICONV_UTF8_UTF32BE: + case (uintptr_t) _ICONV_UTF8_UTF32LE: + case (uintptr_t) _ICONV_UTF16BE_UTF8: + case (uintptr_t) _ICONV_UTF16LE_UTF8: + case (uintptr_t) _ICONV_UTF32BE_UTF8: + case (uintptr_t) _ICONV_UTF32LE_UTF8: + return 0; + } + #endif + return iconv_close (cd); + } *** m4/iconv_open.m4.orig 2007-10-14 12:14:52.000000000 +0200 --- m4/iconv_open.m4 2007-10-14 04:33:07.000000000 +0200 *************** *** 1,4 **** ! # iconv_open.m4 serial 1 dnl Copyright (C) 2007 Free Software Foundation, Inc. dnl This file is free software; the Free Software Foundation dnl gives unlimited permission to copy and/or distribute it, --- 1,4 ---- ! # iconv_open.m4 serial 2 dnl Copyright (C) 2007 Free Software Foundation, Inc. dnl This file is free software; the Free Software Foundation dnl gives unlimited permission to copy and/or distribute it, *************** *** 30,40 **** AC_DEFINE_UNQUOTED([ICONV_FLAVOR], [$iconv_flavor], [Define to a symbolic name denoting the flavor of iconv_open() implementation.]) ! REPLACE_ICONV_OPEN=1 ! AC_LIBOBJ([iconv_open]) ! ICONV_H='iconv.h' fi fi fi ]) --- 30,237 ---- AC_DEFINE_UNQUOTED([ICONV_FLAVOR], [$iconv_flavor], [Define to a symbolic name denoting the flavor of iconv_open() implementation.]) ! gl_REPLACE_ICONV_OPEN fi fi fi ]) + AC_DEFUN([gl_REPLACE_ICONV_OPEN], + [ + REPLACE_ICONV_OPEN=1 + AC_LIBOBJ([iconv_open]) + ICONV_H='iconv.h' + ]) + + AC_DEFUN([gl_FUNC_ICONV_OPEN_UTF], + [ + AC_REQUIRE([gl_FUNC_ICONV_OPEN]) + AC_REQUIRE([AC_CANONICAL_HOST]) dnl for cross-compiles + AC_REQUIRE([gl_ICONV_H_DEFAULTS]) + if test "$am_cv_func_iconv" = yes; then + if test -n "$am_cv_proto_iconv_arg1"; then + ICONV_CONST="const" + else + ICONV_CONST= + fi + AC_SUBST([ICONV_CONST]) + AC_CACHE_CHECK([whether iconv supports conversion between UTF-8 and UTF-{16,32}{BE,LE}], + [gl_func_iconv_supports_utf], + [ + save_LIBS="$LIBS" + LIBS="$LIBS $LIBICONV" + AC_TRY_RUN([ + #include <iconv.h> + #include <errno.h> + #include <stdio.h> + #include <stdlib.h> + #include <string.h> + #define ASSERT(expr) if (!(expr)) return 1; + int main () + { + /* Test conversion from UTF-8 to UTF-16BE with no errors. */ + { + static const char input[] = + "Japanese (\346\227\245\346\234\254\350\252\236) [\360\235\224\215\360\235\224\236\360\235\224\255]"; + static const char expected[] = + "\000J\000a\000p\000a\000n\000e\000s\000e\000 \000(\145\345\147\054\212\236\000)\000 \000[\330\065\335\015\330\065\335\036\330\065\335\055\000]"; + iconv_t cd; + char buf[100]; + const char *inptr; + size_t inbytesleft; + char *outptr; + size_t outbytesleft; + size_t res; + cd = iconv_open ("UTF-16BE", "UTF-8"); + ASSERT (cd != (iconv_t)(-1)); + inptr = input; + inbytesleft = sizeof (input) - 1; + outptr = buf; + outbytesleft = sizeof (buf); + res = iconv (cd, + (ICONV_CONST char **) &inptr, &inbytesleft, + &outptr, &outbytesleft); + ASSERT (res == 0 && inbytesleft == 0); + ASSERT (outptr == buf + (sizeof (expected) - 1)); + ASSERT (memcmp (buf, expected, sizeof (expected) - 1) == 0); + ASSERT (iconv_close (cd) == 0); + } + /* Test conversion from UTF-8 to UTF-16LE with no errors. */ + { + static const char input[] = + "Japanese (\346\227\245\346\234\254\350\252\236) [\360\235\224\215\360\235\224\236\360\235\224\255]"; + static const char expected[] = + "J\000a\000p\000a\000n\000e\000s\000e\000 \000(\000\345\145\054\147\236\212)\000 \000[\000\065\330\015\335\065\330\036\335\065\330\055\335]\000"; + iconv_t cd; + char buf[100]; + const char *inptr; + size_t inbytesleft; + char *outptr; + size_t outbytesleft; + size_t res; + cd = iconv_open ("UTF-16LE", "UTF-8"); + ASSERT (cd != (iconv_t)(-1)); + inptr = input; + inbytesleft = sizeof (input) - 1; + outptr = buf; + outbytesleft = sizeof (buf); + res = iconv (cd, + (ICONV_CONST char **) &inptr, &inbytesleft, + &outptr, &outbytesleft); + ASSERT (res == 0 && inbytesleft == 0); + ASSERT (outptr == buf + (sizeof (expected) - 1)); + ASSERT (memcmp (buf, expected, sizeof (expected) - 1) == 0); + ASSERT (iconv_close (cd) == 0); + } + /* Test conversion from UTF-8 to UTF-32BE with no errors. */ + { + static const char input[] = + "Japanese (\346\227\245\346\234\254\350\252\236) [\360\235\224\215\360\235\224\236\360\235\224\255]"; + static const char expected[] = + "\000\000\000J\000\000\000a\000\000\000p\000\000\000a\000\000\000n\000\000\000e\000\000\000s\000\000\000e\000\000\000 \000\000\000(\000\000\145\345\000\000\147\054\000\000\212\236\000\000\000)\000\000\000 \000\000\000[\000\001\325\015\000\001\325\036\000\001\325\055\000\000\000]"; + iconv_t cd; + char buf[100]; + const char *inptr; + size_t inbytesleft; + char *outptr; + size_t outbytesleft; + size_t res; + cd = iconv_open ("UTF-32BE", "UTF-8"); + ASSERT (cd != (iconv_t)(-1)); + inptr = input; + inbytesleft = sizeof (input) - 1; + outptr = buf; + outbytesleft = sizeof (buf); + res = iconv (cd, + (ICONV_CONST char **) &inptr, &inbytesleft, + &outptr, &outbytesleft); + ASSERT (res == 0 && inbytesleft == 0); + ASSERT (outptr == buf + (sizeof (expected) - 1)); + ASSERT (memcmp (buf, expected, sizeof (expected) - 1) == 0); + ASSERT (iconv_close (cd) == 0); + } + /* Test conversion from UTF-8 to UTF-32LE with no errors. */ + { + static const char input[] = + "Japanese (\346\227\245\346\234\254\350\252\236) [\360\235\224\215\360\235\224\236\360\235\224\255]"; + static const char expected[] = + "J\000\000\000a\000\000\000p\000\000\000a\000\000\000n\000\000\000e\000\000\000s\000\000\000e\000\000\000 \000\000\000(\000\000\000\345\145\000\000\054\147\000\000\236\212\000\000)\000\000\000 \000\000\000[\000\000\000\015\325\001\000\036\325\001\000\055\325\001\000]\000\000\000"; + iconv_t cd; + char buf[100]; + const char *inptr; + size_t inbytesleft; + char *outptr; + size_t outbytesleft; + size_t res; + cd = iconv_open ("UTF-32LE", "UTF-8"); + ASSERT (cd != (iconv_t)(-1)); + inptr = input; + inbytesleft = sizeof (input) - 1; + outptr = buf; + outbytesleft = sizeof (buf); + res = iconv (cd, + (ICONV_CONST char **) &inptr, &inbytesleft, + &outptr, &outbytesleft); + ASSERT (res == 0 && inbytesleft == 0); + ASSERT (outptr == buf + (sizeof (expected) - 1)); + ASSERT (memcmp (buf, expected, sizeof (expected) - 1) == 0); + ASSERT (iconv_close (cd) == 0); + } + /* Test conversion from UTF-16BE to UTF-8 with no errors. + This test fails on NetBSD 3.0. */ + { + static const char input[] = + "\000J\000a\000p\000a\000n\000e\000s\000e\000 \000(\145\345\147\054\212\236\000)\000 \000[\330\065\335\015\330\065\335\036\330\065\335\055\000]"; + static const char expected[] = + "Japanese (\346\227\245\346\234\254\350\252\236) [\360\235\224\215\360\235\224\236\360\235\224\255]"; + iconv_t cd; + char buf[100]; + const char *inptr; + size_t inbytesleft; + char *outptr; + size_t outbytesleft; + size_t res; + cd = iconv_open ("UTF-8", "UTF-16BE"); + ASSERT (cd != (iconv_t)(-1)); + inptr = input; + inbytesleft = sizeof (input) - 1; + outptr = buf; + outbytesleft = sizeof (buf); + res = iconv (cd, + (ICONV_CONST char **) &inptr, &inbytesleft, + &outptr, &outbytesleft); + ASSERT (res == 0 && inbytesleft == 0); + ASSERT (outptr == buf + (sizeof (expected) - 1)); + ASSERT (memcmp (buf, expected, sizeof (expected) - 1) == 0); + ASSERT (iconv_close (cd) == 0); + } + return 0; + }], [gl_func_iconv_supports_utf=yes], [gl_func_iconv_supports_utf=no], + [ + dnl We know that GNU libiconv, GNU libc, and Solaris >= 9 do. + dnl OSF/1 5.1 has these encodings, but inserts a BOM in the "to" + dnl direction. + gl_func_iconv_supports_utf=no + if test $gl_func_iconv_gnu = yes; then + gl_func_iconv_supports_utf=yes + else + changequote(,)dnl + case "$host_os" in + solaris2.9 | solaris2.1[0-9]) gl_func_iconv_supports_utf=yes ;; + esac + changequote([,])dnl + fi + ]) + LIBS="$save_LIBS" + ]) + if test $gl_func_iconv_supports_utf = no; then + REPLACE_ICONV_UTF=1 + AC_DEFINE([REPLACE_ICONV_UTF], 1, + [Define if the iconv() functions are enhanced to handle the UTF-{16,32}{BE,LE} encodings.]) + REPLACE_ICONV=1 + gl_REPLACE_ICONV_OPEN + AC_LIBOBJ([iconv]) + AC_LIBOBJ([iconv_close]) + fi + fi + ]) *** m4/iconv_h.m4.orig 2007-10-14 12:14:52.000000000 +0200 --- m4/iconv_h.m4 2007-10-14 02:42:48.000000000 +0200 *************** *** 1,4 **** ! # iconv_h.m4 serial 2 dnl Copyright (C) 2007 Free Software Foundation, Inc. dnl This file is free software; the Free Software Foundation dnl gives unlimited permission to copy and/or distribute it, --- 1,4 ---- ! # iconv_h.m4 serial 3 dnl Copyright (C) 2007 Free Software Foundation, Inc. dnl This file is free software; the Free Software Foundation dnl gives unlimited permission to copy and/or distribute it, *************** *** 22,26 **** --- 22,28 ---- AC_DEFUN([gl_ICONV_H_DEFAULTS], [ dnl Assume proper GNU behavior unless another module says otherwise. + REPLACE_ICONV=0; AC_SUBST([REPLACE_ICONV]) REPLACE_ICONV_OPEN=0; AC_SUBST([REPLACE_ICONV_OPEN]) + REPLACE_ICONV_UTF=0; AC_SUBST([REPLACE_ICONV_UTF]) ]) *** modules/iconv_open.orig 2007-10-14 12:14:52.000000000 +0200 --- modules/iconv_open 2007-10-14 01:10:41.000000000 +0200 *************** *** 15,20 **** --- 15,21 ---- include_next iconv c-ctype + c-strcase configure.ac: gl_ICONV_H *************** *** 30,36 **** --- 31,40 ---- { echo '/* DO NOT EDIT! GENERATED AUTOMATICALLY! */' && \ sed -e 's/@''INCLUDE_NEXT''@/$(INCLUDE_NEXT)/g' \ -e 's|@''NEXT_ICONV_H''@|$(NEXT_ICONV_H)|g' \ + -e 's|@''ICONV_CONST''@|$(ICONV_CONST)|g' \ + -e 's|@''REPLACE_ICONV''@|$(REPLACE_ICONV)|g' \ -e 's|@''REPLACE_ICONV_OPEN''@|$(REPLACE_ICONV_OPEN)|g' \ + -e 's|@''REPLACE_ICONV_UTF''@|$(REPLACE_ICONV_UTF)|g' \ < $(srcdir)/iconv.in.h; \ } > [EMAIL PROTECTED] mv [EMAIL PROTECTED] $@ *** doc/functions/iconv_open.texi.orig 2007-10-14 12:14:52.000000000 +0200 --- doc/functions/iconv_open.texi 2007-10-14 03:09:31.000000000 +0200 *************** *** 4,10 **** POSIX specification: @url{http://www.opengroup.org/susv3xsh/iconv_open.html} ! Gnulib module: iconv and iconv_open Portability problems fixed by either Gnulib module @code{iconv} or @code{iconv_open}: @itemize --- 4,10 ---- POSIX specification: @url{http://www.opengroup.org/susv3xsh/iconv_open.html} ! Gnulib module: iconv, iconv_open, iconv_open-utf Portability problems fixed by either Gnulib module @code{iconv} or @code{iconv_open}: @itemize *************** *** 23,28 **** --- 23,36 ---- AIX 5.1, HP-UX 11, IRIX 6.5, OSF/1 5.1. @end itemize + Portability problems fixed by Gnulib module @code{iconv_open-utf}: + @itemize + @item + This function does not support the encodings UTF-16BE, UTF-16LE, UTF-32BE, + UTF-32LE on many platforms: + AIX 5.1, HP-UX 11, IRIX 6.5, OSF/1 5.1, Solaris 8. + @end itemize + Portability problems not fixed by Gnulib: @itemize @item