Daniel P. Berrangé <berra...@redhat.com> writes: > Localization is not a feature whose impact is limited to the UI > frontends. Other parts of QEMU rely in localization.
Typically by accident :) > In particular the > USB MTP driver needs to be able to convert filenames from the locale > specified character set into UTF-16 / UCS-2 encoded wide characters. Oookay. I guess that makes some sense. > setlocale() is only set from two of the UI frontends though, and worse, > there is inconsistent behaviour with GTK setting LC_CTYPE to C.UTF-8, > while ncurses honours whatever is set in the user's environment. > > This causes the USP MTP driver to behave differently depending on which > UI frontend is activated. > > Furthermore, the curses settings are dangerous because LC_CTYPE will affect > the is{upper,lower,alnum} functions which much QEMU code assumes to have > C locale sorting behaviour. This also breaks QMP if the env requests a > non-UTF-8 locale, since QMP is defined to use UTF-8 encoding for JSON. Can you elaborate on how QMP breaks with a non-UTF-8 locale? Ah, you do in the comment you add to vl.c. > This problematic curses code was introduced in: > > commit 2f8b7cd587558944532f587abb5203ce54badba9 > Author: Samuel Thibault <samuel.thiba...@ens-lyon.org> > Date: Mon Mar 11 14:51:27 2019 +0100 > > curses: add option to specify VGA font encoding > > This patch moves the GTK frontend setlocale() handling into the main() > method. This ensures QMP and other QEMU code has a predictable C.UTF-8. > > Eventually QEMU should set LC_ALL, honouring the full user environment, > but this needs various cleanups in QEMU code first. Eventually, we'll all be dead. > Hardcoding LC_CTYPE > to C.UTF-8 is a partial regression vs the above curses commit, since it > will break the curses wide character handling for non-UTF-8 locales but > this is unavoidable until QEMU is cleaned up to cope with non-UTF-8 > locales fully. > > Setting of LC_MESSAGES is left in the GTK code since only the GTK > frontend is using translation of strings. This lets us avoid the > platform portability problem where LC_MESSAGES is not provided by > locale.h on MinGW. GTK pulls it in indirectly from libintl.h via > gi18n.h header, but we don't want to pull that into the global > QEMU namespace. > > Signed-off-by: Daniel P. Berrangé <berra...@redhat.com> > --- > > Changed in v2: > > - Leave LC_MESSAGES setting in gtk code to avoid platform > portability problems. > > ui/curses.c | 2 -- > ui/gtk.c | 32 +++++++++++--------------------- > vl.c | 40 ++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 51 insertions(+), 23 deletions(-) > > diff --git a/ui/curses.c b/ui/curses.c > index cc6d6da684..403cd57913 100644 > --- a/ui/curses.c > +++ b/ui/curses.c > @@ -27,7 +27,6 @@ > #include <sys/ioctl.h> > #include <termios.h> > #endif > -#include <locale.h> > #include <wchar.h> > #include <langinfo.h> > #include <iconv.h> > @@ -716,7 +715,6 @@ static void curses_display_init(DisplayState *ds, > DisplayOptions *opts) > } > #endif > > - setlocale(LC_CTYPE, ""); > if (opts->u.curses.charset) { > font_charset = opts->u.curses.charset; > } > diff --git a/ui/gtk.c b/ui/gtk.c > index e96e15435a..e6a41c79ab 100644 > --- a/ui/gtk.c > +++ b/ui/gtk.c > @@ -2208,12 +2208,12 @@ static void gtk_display_init(DisplayState *ds, > DisplayOptions *opts) > > s->free_scale = FALSE; > > - /* Mostly LC_MESSAGES only. See early_gtk_display_init() for details. For > - * LC_CTYPE, we need to make sure that non-ASCII characters are > considered > - * printable, but without changing any of the character classes to make > - * sure that we don't accidentally break implicit assumptions. */ > + /* > + * See comment in main() for why it has only setup LC_CTYPE, > + * as opposed to LC_ALL. Given that we need to enable > + * LC_MESSAGES too for menu translations. > + */ > setlocale(LC_MESSAGES, ""); > - setlocale(LC_CTYPE, "C.UTF-8"); > bindtextdomain("qemu", CONFIG_QEMU_LOCALEDIR); > textdomain("qemu"); > > @@ -2262,22 +2262,12 @@ static void gtk_display_init(DisplayState *ds, > DisplayOptions *opts) > > static void early_gtk_display_init(DisplayOptions *opts) > { > - /* The QEMU code relies on the assumption that it's always run in > - * the C locale. Therefore it is not prepared to deal with > - * operations that produce different results depending on the > - * locale, such as printf's formatting of decimal numbers, and > - * possibly others. > - * > - * Since GTK+ calls setlocale() by default -importing the locale > - * settings from the environment- we must prevent it from doing so > - * using gtk_disable_setlocale(). > - * > - * QEMU's GTK+ UI, however, _does_ have translations for some of > - * the menu items. As a trade-off between a functionally correct > - * QEMU and a fully internationalized UI we support importing > - * LC_MESSAGES from the environment (see the setlocale() call > - * earlier in this file). This allows us to display translated > - * messages leaving everything else untouched. > + /* > + * GTK calls setlocale() by default, importing the locale > + * settings from the environment. QEMU's main() method will > + * have set LC_MESSAGES and LC_CTYPE to allow GTK to display > + * translated messages, including use of wide characters. We > + * must not allow GTK to change other aspects of the locale. > */ > gtk_disable_setlocale(); > gtkinit = gtk_init_check(NULL, NULL); > diff --git a/vl.c b/vl.c > index c696ad2a13..87a55cddb3 100644 > --- a/vl.c > +++ b/vl.c > @@ -49,6 +49,7 @@ int main(int argc, char **argv) > #define main qemu_main > #endif /* CONFIG_COCOA */ > > +#include <locale.h> > > #include "qemu/error-report.h" > #include "qemu/sockets.h" > @@ -3022,6 +3023,45 @@ int main(int argc, char **argv, char **envp) > char *dir, **dirs; > BlockdevOptionsQueue bdo_queue = QSIMPLEQ_HEAD_INITIALIZER(bdo_queue); > > + /* > + * Ideally we would set LC_ALL, but QEMU currently isn't able to cope > + * with arbitrary localization settings. In particular there are two > + * known problems > + * > + * - The QMP monitor needs to use the C locale rules for numeric > + * formatting. This would need a double/int -> string formatter > + * that is locale independant. Aha. Reminds me of the story of a FORTRAN compiler that rejected only far-away customers' programs... > + * > + * - The QMP monitor needs to encode all data as UTF-8. This needs > + * to be updated to use iconv(3) to explicitly convert the current > + * locale's charset into utf-8 Really? What locale-dependent text gets sent to QMP? > + * > + * - Lots of codes uses is{upper,lower,alnum,...} functions, expecting > + * C locale sorting behaviour. Most QEMU usage should likely be > + * changed to g_ascii_is{upper,lower,alnum...} to match code > + * assumptions, without being broken by locale settnigs. > + * > + * We do still have two requirements > + * > + * - Ability to correct display translated text according to the > + * user's locale > + * > + * - Ability to handle multibyte characters, ideally according to > + * user's locale specified character set. This affects ability > + * of usb-mtp to correctly convert filenames to UCS16 and curses > + * & GTK frontends wide character display. > + * > + * The second requirement would need LC_CTYPE to be honoured, but > + * this conflicts with the 2nd & 3rd problems listed earlier. For > + * now we make a tradeoff, trying to set an explicit UTF-8 localee > + * > + * Note we can't set LC_MESSAGES here, since mingw doesn't define > + * this constant in locale.h Fortunately we only need it for the > + * GTK frontend and that uses gi18n.h which pulls in a definition > + * of LC_MESSAGES. > + */ > + setlocale(LC_CTYPE, "C.UTF-8"); > + > module_call_init(MODULE_INIT_TRACE); > > qemu_init_cpu_list(); We should've stayed out of the GUI business.