On NetBSD 5.0, the u8-strcoll test failed, because it used iconv with transliteration, and the results depend too much on the iconv implementation being used: "•" maps to "o" with glibc or libiconv, but to "?" with NetBSD iconv. The fix is to rely only on strict (lossless) iconv conversion.
2010-05-24 Bruno Haible <br...@clisp.org> Don't use conversion with transliteration in u{8,16,32}_strcoll. * lib/unistr/u-strcoll.h (FUNC): Use U_STRCONV_TO_ENCODING with iconveh_error argument. * lib/unistr/u8-strcoll.c: Define U_STRCONV_TO_ENCODING instead of U_STRCONV_TO_LOCALE. * lib/unistr/u16-strcoll.c: Likewise. * lib/unistr/u32-strcoll.c: Likewise. * modules/unistr/u8-strcoll (Depends-on): Add uniconv/u8-strconv-to-enc, localcharset. Remove uniconv/u8-strconv-to-locale. (configure.ac): Bump version number. * modules/unistr/u16-strcoll (Depends-on): Add uniconv/u16-strconv-to-enc, localcharset. Remove uniconv/u16-strconv-to-locale. (configure.ac): Bump version number. * modules/unistr/u32-strcoll (Depends-on): Add uniconv/u32-strconv-to-enc, localcharset. Remove uniconv/u32-strconv-to-locale. (configure.ac): Bump version number. --- lib/unistr/u-strcoll.h.orig Mon May 24 22:55:35 2010 +++ lib/unistr/u-strcoll.h Mon May 24 22:43:30 2010 @@ -23,14 +23,19 @@ When it fails, it sets errno, but also returns a meaningful return value, for the sake of callers which ignore errno. */ int final_errno = errno; + const char *encoding = locale_charset (); char *sl1; char *sl2; int result; - sl1 = U_STRCONV_TO_LOCALE (s1); + /* Pass iconveh_error here, not iconveh_question_mark. Otherwise the + conversion to locale encoding can do transliteration or map some + characters to question marks, leading to results that depend on the + iconv() implementation and are not obvious. */ + sl1 = U_STRCONV_TO_ENCODING (s1, encoding, iconveh_error); if (sl1 != NULL) { - sl2 = U_STRCONV_TO_LOCALE (s2); + sl2 = U_STRCONV_TO_ENCODING (s2, encoding, iconveh_error); if (sl2 != NULL) { /* Compare sl1 and sl2. */ @@ -41,10 +46,10 @@ /* strcoll succeeded. */ free (sl1); free (sl2); - /* The conversion to locale encoding can do transliteration or - map some characters to question marks. Therefore sl1 and sl2 - may be equal when s1 and s2 were in fact different. Return a - nonzero result in this case. */ + /* The conversion to locale encoding can drop Unicode TAG + characters. Therefore sl1 and sl2 may be equal when s1 + and s2 were in fact different. Return a nonzero result + in this case. */ if (result == 0) result = U_STRCMP (s1, s2); } @@ -68,7 +73,7 @@ else { final_errno = errno; - sl2 = U_STRCONV_TO_LOCALE (s2); + sl2 = U_STRCONV_TO_ENCODING (s2, encoding, iconveh_error); if (sl2 != NULL) { /* s2 could be converted to locale encoding, s1 not. */ --- lib/unistr/u8-strcoll.c.orig Mon May 24 22:55:35 2010 +++ lib/unistr/u8-strcoll.c Mon May 24 22:43:33 2010 @@ -29,5 +29,5 @@ #define FUNC u8_strcoll #define UNIT uint8_t #define U_STRCMP u8_strcmp -#define U_STRCONV_TO_LOCALE u8_strconv_to_locale +#define U_STRCONV_TO_ENCODING u8_strconv_to_encoding #include "u-strcoll.h" --- lib/unistr/u16-strcoll.c.orig Mon May 24 22:55:35 2010 +++ lib/unistr/u16-strcoll.c Mon May 24 22:43:35 2010 @@ -29,5 +29,5 @@ #define FUNC u16_strcoll #define UNIT uint16_t #define U_STRCMP u16_strcmp -#define U_STRCONV_TO_LOCALE u16_strconv_to_locale +#define U_STRCONV_TO_ENCODING u16_strconv_to_encoding #include "u-strcoll.h" --- lib/unistr/u32-strcoll.c.orig Mon May 24 22:55:35 2010 +++ lib/unistr/u32-strcoll.c Mon May 24 22:43:34 2010 @@ -29,5 +29,5 @@ #define FUNC u32_strcoll #define UNIT uint32_t #define U_STRCMP u32_strcmp -#define U_STRCONV_TO_LOCALE u32_strconv_to_locale +#define U_STRCONV_TO_ENCODING u32_strconv_to_encoding #include "u-strcoll.h" --- modules/unistr/u8-strcoll.orig Mon May 24 22:55:35 2010 +++ modules/unistr/u8-strcoll Mon May 24 22:46:40 2010 @@ -8,10 +8,11 @@ Depends-on: unistr/base unistr/u8-strcmp -uniconv/u8-strconv-to-locale +uniconv/u8-strconv-to-enc +localcharset configure.ac: -gl_LIBUNISTRING_LIBSOURCE([0.9.3], [unistr/u8-strcoll.c]) +gl_LIBUNISTRING_LIBSOURCE([0.9.4], [unistr/u8-strcoll.c]) Makefile.am: --- modules/unistr/u16-strcoll.orig Mon May 24 22:55:35 2010 +++ modules/unistr/u16-strcoll Mon May 24 22:46:34 2010 @@ -8,10 +8,11 @@ Depends-on: unistr/base unistr/u16-strcmp -uniconv/u16-strconv-to-locale +uniconv/u16-strconv-to-enc +localcharset configure.ac: -gl_LIBUNISTRING_LIBSOURCE([0.9.3], [unistr/u16-strcoll.c]) +gl_LIBUNISTRING_LIBSOURCE([0.9.4], [unistr/u16-strcoll.c]) Makefile.am: --- modules/unistr/u32-strcoll.orig Mon May 24 22:55:35 2010 +++ modules/unistr/u32-strcoll Mon May 24 22:46:29 2010 @@ -8,10 +8,11 @@ Depends-on: unistr/base unistr/u32-strcmp -uniconv/u32-strconv-to-locale +uniconv/u32-strconv-to-enc +localcharset configure.ac: -gl_LIBUNISTRING_LIBSOURCE([0.9.3], [unistr/u32-strcoll.c]) +gl_LIBUNISTRING_LIBSOURCE([0.9.4], [unistr/u32-strcoll.c]) Makefile.am: