Hi Daiki, > A while ago Matthias Clasen pointed me to a bug that is caused by a race > condition between a getenv() call in gettext() and a setenv() call in > another thread: > https://bugzilla.gnome.org/show_bug.cgi?id=754951 > > The direct cause of this bug is that gettext() tries to check LANGUAGE > envvar, while the string content returned by getenv() can be overwritten > by setenv() before being used.
And the deeper cause of this bug is that programs are calling setenv() in a multi-threaded program, although the Glibc manual http://www.gnu.org/software/libc/manual/html_node/Environment-Access.html says: "Modifications of environment variables are not allowed in multi-threaded programs." There's a similar rant regarding setenv() in http://www.club.cc.cmu.edu/~cmccabe/blog_the_setenv_fiasco.html Why is this being reported for the LANGUAGE environment variable but not for the LANG and LC_ALL environment variables? Because for LANG and LC_* we have an architecture composed of three functionalities: (A) environment variables: getenv(), setenv() (B) locales: setlocale(), newlocale(), uselocale(). (C) gettext() and friends. (A) is the bottom-most layer. But it has the limitation that multi-threaded programs must not call setenv(). (B) is a layer that fetches the initial values from (A), and that allows mutators (setlocale(), uselocale()) in multi-threaded programs. So that multi-threaded applications can modify the program's locale after startup, there is the setlocale() function. So that multi-threaded programs can have a locale per thread, there is a uselocale() function. (C) is an application layer that happens to be in Glibc for convenience reasons. It is based on the layer (B). Back to the LANGUAGE environment variable. The problem is that here we have the layers (A) and (C), but (B) is missing. The solution ought to be to introduce a layer (B) for LANGUAGE. LANGUAGE is not specified by POSIX and does not perfectly fit into the locale system, therefore I believe it is best treated separately. So, what I imagine is a layer (B) with an API like this: /* Returns the language precedence list for the program. */ const char *get_i18n_language (void); /* Sets the language precedence list for the program. NULL means to use the one inferred from the environment variable. */ void set_i18n_language (const char *); or - if you want to have a language per thread -: /* Returns the language precedence list for the current thread. */ const char *get_i18n_language (void); /* Sets the language precedence list for the program. NULL means to use the one inferred from the environment variable. */ void set_i18n_language (const char *); /* Sets the language precedence list for the current thread. NULL means to use the one for the program or, if not set, the one inferred from the environment variable. */ void set_thread_i18n_language (const char *); You can protect the implementation of these functions with locks (functions/macros gl_rwlock_*). With this approach, - Multithread program can change the i18n language in a thread-safe way, without using setenv(). - The setlocale() code is left alone. > To mitigate this, I was wondering if it would be possible to move the > getenv() call from gettext() to setlocale(), which is typically called > at program startup, and cache the value in a static variable. > > The attached patch is an experiment implementing that in libintl, though > in practice it would require a change in glibc's setlocale > implementation. I see two drawbacks of this patch: * It does not solve the root of the problem, namely the violation of the rule "multi-threaded programs should not call setenv". * It modifies the code of setlocale() for a purpose that is unrelated to the locale system. The fact that glibc's locale/setlocale.c has to increment _nl_msg_cat_cntr (notification from layer (B) to layer (C)) is already bad enough; it exists because there is no standardized API for being notified of locale changes. It forces us to override setlocale on non-glibc systems, using gnulibology patterns. But adding yet another call from layer (B) to layer (C) is even more of a hack. Bruno