On Tue, Apr 16, 2019 at 09:49:09AM +0200, Markus Armbruster wrote: > Daniel P. Berrangé <berra...@redhat.com> writes: > > > Localization is not a feature whose impact is limited to the UI > > frontends. Other parts of QEMU rely in localization. > > Typically by accident :) > > > In particular the > > USB MTP driver needs to be able to convert filenames from the locale > > specified character set into UTF-16 / UCS-2 encoded wide characters. > > Oookay. I guess that makes some sense. > > > setlocale() is only set from two of the UI frontends though, and worse, > > there is inconsistent behaviour with GTK setting LC_CTYPE to C.UTF-8, > > while ncurses honours whatever is set in the user's environment. > > > > This causes the USP MTP driver to behave differently depending on which > > UI frontend is activated. > > > > Furthermore, the curses settings are dangerous because LC_CTYPE will affect > > the is{upper,lower,alnum} functions which much QEMU code assumes to have > > C locale sorting behaviour. This also breaks QMP if the env requests a > > non-UTF-8 locale, since QMP is defined to use UTF-8 encoding for JSON. > > Can you elaborate on how QMP breaks with a non-UTF-8 locale? Ah, you do > in the comment you add to vl.c. > > > This problematic curses code was introduced in: > > > > commit 2f8b7cd587558944532f587abb5203ce54badba9 > > Author: Samuel Thibault <samuel.thiba...@ens-lyon.org> > > Date: Mon Mar 11 14:51:27 2019 +0100 > > > > curses: add option to specify VGA font encoding > > > > This patch moves the GTK frontend setlocale() handling into the main() > > method. This ensures QMP and other QEMU code has a predictable C.UTF-8. > > > > Eventually QEMU should set LC_ALL, honouring the full user environment, > > but this needs various cleanups in QEMU code first. > > Eventually, we'll all be dead. > > > Hardcoding LC_CTYPE > > to C.UTF-8 is a partial regression vs the above curses commit, since it > > will break the curses wide character handling for non-UTF-8 locales but > > this is unavoidable until QEMU is cleaned up to cope with non-UTF-8 > > locales fully. > > > > Setting of LC_MESSAGES is left in the GTK code since only the GTK > > frontend is using translation of strings. This lets us avoid the > > platform portability problem where LC_MESSAGES is not provided by > > locale.h on MinGW. GTK pulls it in indirectly from libintl.h via > > gi18n.h header, but we don't want to pull that into the global > > QEMU namespace. > > > > Signed-off-by: Daniel P. Berrangé <berra...@redhat.com> > > --- > > > > Changed in v2: > > > > - Leave LC_MESSAGES setting in gtk code to avoid platform > > portability problems. > > > > ui/curses.c | 2 -- > > ui/gtk.c | 32 +++++++++++--------------------- > > vl.c | 40 ++++++++++++++++++++++++++++++++++++++++ > > 3 files changed, 51 insertions(+), 23 deletions(-) > > > > diff --git a/ui/curses.c b/ui/curses.c > > index cc6d6da684..403cd57913 100644 > > --- a/ui/curses.c > > +++ b/ui/curses.c > > @@ -27,7 +27,6 @@ > > #include <sys/ioctl.h> > > #include <termios.h> > > #endif > > -#include <locale.h> > > #include <wchar.h> > > #include <langinfo.h> > > #include <iconv.h> > > @@ -716,7 +715,6 @@ static void curses_display_init(DisplayState *ds, > > DisplayOptions *opts) > > } > > #endif > > > > - setlocale(LC_CTYPE, ""); > > if (opts->u.curses.charset) { > > font_charset = opts->u.curses.charset; > > } > > diff --git a/ui/gtk.c b/ui/gtk.c > > index e96e15435a..e6a41c79ab 100644 > > --- a/ui/gtk.c > > +++ b/ui/gtk.c > > @@ -2208,12 +2208,12 @@ static void gtk_display_init(DisplayState *ds, > > DisplayOptions *opts) > > > > s->free_scale = FALSE; > > > > - /* Mostly LC_MESSAGES only. See early_gtk_display_init() for details. > > For > > - * LC_CTYPE, we need to make sure that non-ASCII characters are > > considered > > - * printable, but without changing any of the character classes to make > > - * sure that we don't accidentally break implicit assumptions. */ > > + /* > > + * See comment in main() for why it has only setup LC_CTYPE, > > + * as opposed to LC_ALL. Given that we need to enable > > + * LC_MESSAGES too for menu translations. > > + */ > > setlocale(LC_MESSAGES, ""); > > - setlocale(LC_CTYPE, "C.UTF-8"); > > bindtextdomain("qemu", CONFIG_QEMU_LOCALEDIR); > > textdomain("qemu"); > > > > @@ -2262,22 +2262,12 @@ static void gtk_display_init(DisplayState *ds, > > DisplayOptions *opts) > > > > static void early_gtk_display_init(DisplayOptions *opts) > > { > > - /* The QEMU code relies on the assumption that it's always run in > > - * the C locale. Therefore it is not prepared to deal with > > - * operations that produce different results depending on the > > - * locale, such as printf's formatting of decimal numbers, and > > - * possibly others. > > - * > > - * Since GTK+ calls setlocale() by default -importing the locale > > - * settings from the environment- we must prevent it from doing so > > - * using gtk_disable_setlocale(). > > - * > > - * QEMU's GTK+ UI, however, _does_ have translations for some of > > - * the menu items. As a trade-off between a functionally correct > > - * QEMU and a fully internationalized UI we support importing > > - * LC_MESSAGES from the environment (see the setlocale() call > > - * earlier in this file). This allows us to display translated > > - * messages leaving everything else untouched. > > + /* > > + * GTK calls setlocale() by default, importing the locale > > + * settings from the environment. QEMU's main() method will > > + * have set LC_MESSAGES and LC_CTYPE to allow GTK to display > > + * translated messages, including use of wide characters. We > > + * must not allow GTK to change other aspects of the locale. > > */ > > gtk_disable_setlocale(); > > gtkinit = gtk_init_check(NULL, NULL); > > diff --git a/vl.c b/vl.c > > index c696ad2a13..87a55cddb3 100644 > > --- a/vl.c > > +++ b/vl.c > > @@ -49,6 +49,7 @@ int main(int argc, char **argv) > > #define main qemu_main > > #endif /* CONFIG_COCOA */ > > > > +#include <locale.h> > > > > #include "qemu/error-report.h" > > #include "qemu/sockets.h" > > @@ -3022,6 +3023,45 @@ int main(int argc, char **argv, char **envp) > > char *dir, **dirs; > > BlockdevOptionsQueue bdo_queue = QSIMPLEQ_HEAD_INITIALIZER(bdo_queue); > > > > + /* > > + * Ideally we would set LC_ALL, but QEMU currently isn't able to cope > > + * with arbitrary localization settings. In particular there are two > > + * known problems > > + * > > + * - The QMP monitor needs to use the C locale rules for numeric > > + * formatting. This would need a double/int -> string formatter > > + * that is locale independant. > > Aha. > > Reminds me of the story of a FORTRAN compiler that rejected only > far-away customers' programs... > > > + * > > + * - The QMP monitor needs to encode all data as UTF-8. This needs > > + * to be updated to use iconv(3) to explicitly convert the current > > + * locale's charset into utf-8 > > Really? What locale-dependent text gets sent to QMP?
The main thing I can see would be filenames. Though having said it is UTF-8 on looking more closely I think QEMU is probably 8-bit clean in its handling, so will just be blindly passing whatever filename string it get from libvirt straight on to the kernel with no interpretation. Libvirt has enabled UTF-8 validation in its JSON library when encoding data it sends to QEMU, so any data libvirt is sending will be a valid UTF-8 byte sequence at least. Libvirt doesn't axctually do any charset conversion though, so if libvirt runs in a non-UTF8 locale it will likely trip over this UTF-8 validation. > > > + * > > + * - Lots of codes uses is{upper,lower,alnum,...} functions, > > expecting > > + * C locale sorting behaviour. Most QEMU usage should likely be > > + * changed to g_ascii_is{upper,lower,alnum...} to match code > > + * assumptions, without being broken by locale settnigs. > > + * > > + * We do still have two requirements > > + * > > + * - Ability to correct display translated text according to the > > + * user's locale > > + * > > + * - Ability to handle multibyte characters, ideally according to > > + * user's locale specified character set. This affects ability > > + * of usb-mtp to correctly convert filenames to UCS16 and curses > > + * & GTK frontends wide character display. > > + * > > + * The second requirement would need LC_CTYPE to be honoured, but > > + * this conflicts with the 2nd & 3rd problems listed earlier. For > > + * now we make a tradeoff, trying to set an explicit UTF-8 localee > > + * > > + * Note we can't set LC_MESSAGES here, since mingw doesn't define > > + * this constant in locale.h Fortunately we only need it for the > > + * GTK frontend and that uses gi18n.h which pulls in a definition > > + * of LC_MESSAGES. > > + */ > > + setlocale(LC_CTYPE, "C.UTF-8"); > > + > > module_call_init(MODULE_INIT_TRACE); > > > > qemu_init_cpu_list(); > > We should've stayed out of the GUI business. This isn't only a GUI problem as above, it affects USB MTP. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|