Bruno Haible wrote: > Hi Paolo, > >> Before proceeding, however, I'm curious whether using nl_langinfo >> (CODESET) is less precise than locale_charset on some platform. Bruno? > > Here's my reply to Jim from yesterday. For some reason it was apparently > not distributed to the mailing list. > > Hi Jim, > >> @@ -893,7 +896,9 @@ init_dfa (re_dfa_t *dfa, size_t pat_len) >> dfa->map_notascii = (_NL_CURRENT_WORD (LC_CTYPE, _NL_CTYPE_MAP_TO_NONASCII) >> != 0); >> #else >> - if (strcmp (locale_charset (), "UTF-8") == 0) >> + codeset_name = nl_langinfo (CODESET); >> + if (strcasecmp (codeset_name, "UTF-8") == 0 >> + || strcasecmp (codeset_name, "UTF8") == 0) >> dfa->is_utf8 = 1; >> >> /* We check exhaustively in the loop below if this charset is a > > This patch is not wrong: It takes care of the fact that the result > of nl_langinfo(CODESET) can be in upper case or in lower case, > depending on the system, and that on HP-UX, "utf8" is returned > (see lib/config.charset). > > But I would nevertheless not apply it nor recommend it, because
By "it" I presume you mean only the remainder: the use nl_langinfo-unconditionally part. > the nl_langinfo module may include a lot more stuff in the future: > - It may include real localizations of the values, instead of returning > English dummy values. I have already written the converter from > glibc locale data to PO files that can be read by the nl_langinfo > replacement. > - It may include an emulation of the NL_LOCALE_NAME(category) > macro that works since glibc 2.11.1. This emulation would rely on > the 'localename' module. > When all you need is nl_langinfo(CODESET), the full-blown 'nl_langinfo' > module is too heavyweight. 'localcharset' is not a POSIX API, but fits > better due to the gnulib module structure. At first, I was thinking of reverting most of that patch, and even wrote the log and committed (locally) the result.[1] But then I thought "What systems would suffer if I didn't?" Only, shall we say, "challenged" systems would suffer the added overhead. Overhead that is as yet only theoretical/planned and of course unmeasured. In general, gnulib strives to use the "right" APIs, and if that imposes some overhead on older or less-functional systems, that's not a problem. Making the code maintainable for everyone is more important than sacrificing even a small amount of readability or maintainability for some unspecified gains. Of course, if using nl_langinfo imposes some inordinate overhead on important systems, then that should be addressed. Bruno would you object to waiting until there are measurements that show there is enough overhead to warrant reverting the change? The exposure of using nl_langinfo here has already shaken out one bug... [1] Here's the patch I wrote (not to be applied now). Posting in case it will come in handy later. >From da3f9dc64db294506c53c244e36800b0f4d958d3 Mon Sep 17 00:00:00 2001 From: Jim Meyering <meyer...@redhat.com> Date: Tue, 5 Jan 2010 22:17:43 +0100 Subject: [PATCH] regcomp: revert most of 0cfc3b87f0c3be63db1075ed465443c4b3c4cec2 ... on advice from Bruno Haible that gnulib's nl_langinfo may end up being relatively heavy-weight compared to the existing, localcharset-based work-around. * lib/regcomp.c (init_dfa) [!LIBC]: Do *not* always use nl_langinfo (CODESET). * modules/regex (Depends-on): Remove nl_langinfo. --- ChangeLog | 10 ++++++++++ lib/regcomp.c | 3 --- modules/regex | 1 - 3 files changed, 10 insertions(+), 4 deletions(-) diff --git a/ChangeLog b/ChangeLog index 80ba5cd..0d5084d 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,13 @@ +2010-01-05 Jim Meyering <meyer...@redhat.com> + + regcomp: revert most of 0cfc3b87f0c3be63db1075ed465443c4b3c4cec2 + ... on advice from Bruno Haible that gnulib's nl_langinfo may + end up being relatively heavy-weight compared to the existing, + localcharset-based work-around. + * lib/regcomp.c (init_dfa) [!LIBC]: Do *not* always use + nl_langinfo (CODESET). + * modules/regex (Depends-on): Remove nl_langinfo. + 2010-01-05 Aurelien Jarno <aurel...@aurel32.net> utimens (fdutimens): ignore a negative FD, per contract diff --git a/lib/regcomp.c b/lib/regcomp.c index ebb696a..802f4a8 100644 --- a/lib/regcomp.c +++ b/lib/regcomp.c @@ -850,9 +850,6 @@ static reg_errcode_t init_dfa (re_dfa_t *dfa, size_t pat_len) { __re_size_t table_size; -#ifndef _LIBC - char *codeset_name; -#endif #ifdef RE_ENABLE_I18N size_t max_i18n_object_size = MAX (sizeof (wchar_t), sizeof (wctype_t)); #else diff --git a/modules/regex b/modules/regex index f516406..c6a1235 100644 --- a/modules/regex +++ b/modules/regex @@ -22,7 +22,6 @@ memcmp memmove mbrtowc mbsinit -nl_langinfo stdbool stdint ssize_t -- 1.6.6.387.g2649b1