On Tue, Apr 16, 2019 at 06:01:46PM +0200, Markus Armbruster wrote: > Daniel P. Berrangé <berra...@redhat.com> writes: > > > On Tue, Apr 16, 2019 at 09:49:09AM +0200, Markus Armbruster wrote: > >> Daniel P. Berrangé <berra...@redhat.com> writes: > > The main thing I can see would be filenames. > > > > Though having said it is UTF-8 on looking more closely I think QEMU is > > probably 8-bit clean in its handling, so will just be blindly passing > > whatever filename string it get from libvirt straight on to the kernel > > with no interpretation. > > Sounds good to me. > > > Libvirt has enabled UTF-8 validation in its JSON library when encoding > > data it sends to QEMU, so any data libvirt is sending will be a valid > > UTF-8 byte sequence at least. Libvirt doesn't axctually do any charset > > conversion though, so if libvirt runs in a non-UTF8 locale it will > > likely trip over this UTF-8 validation. > > QMP input must be encoded in UTF-8. Converting from other encodings to > UTF-8 is the QMP client's problem.
Ok, so consider the host OS is globally running in a non-UTF-8 locale such as ISO8859-1. This means that any multibyte filenames in the filesystem are assumed to be in ISO8859-1 encoding. Since QMP input must be UTF-8, libvirt must convert the filename from the current locale (ISO8859-1) to UTF-8 otherwise it might be putting an invalid UTF-8 sequence in the JSON. For QEMU to be able to open the file, QEMU must be honouring the host OS LC_CTYPE, and converting from UTF-8 back to LC_CTYPE character set. > > The more interesting direction is the one I inquired about: QMP output. > If locale-dependent text gets sent to QMP, converting it to UTF-8 is > QEMU's problem. > > On closer look, anything but JSON string contents is plain ASCII by > construction. JSON string contents gets assembled in to_json() case > QTYPE_QSTRING. It expects QString to use UTF-8[*]. You can have any > locale as long as it uses ASCII or UTF-8. IOW > > >> > + * > >> > + * - Lots of codes uses is{upper,lower,alnum,...} functions, > >> > expecting > >> > + * C locale sorting behaviour. Most QEMU usage should likely be > >> > + * changed to g_ascii_is{upper,lower,alnum...} to match code > >> > + * assumptions, without being broken by locale settnigs. > >> > + * > >> > + * We do still have two requirements > >> > + * > >> > + * - Ability to correct display translated text according to the > >> > + * user's locale > >> > + * > >> > + * - Ability to handle multibyte characters, ideally according to > >> > + * user's locale specified character set. This affects ability > >> > + * of usb-mtp to correctly convert filenames to UCS16 and curses > >> > + * & GTK frontends wide character display. > >> > + * > >> > + * The second requirement would need LC_CTYPE to be honoured, but > >> > + * this conflicts with the 2nd & 3rd problems listed earlier. For > >> > + * now we make a tradeoff, trying to set an explicit UTF-8 localee > >> > + * > >> > + * Note we can't set LC_MESSAGES here, since mingw doesn't define > >> > + * this constant in locale.h Fortunately we only need it for the > >> > + * GTK frontend and that uses gi18n.h which pulls in a definition > >> > + * of LC_MESSAGES. > >> > + */ > >> > + setlocale(LC_CTYPE, "C.UTF-8"); > >> > + > >> > module_call_init(MODULE_INIT_TRACE); > >> > > >> > qemu_init_cpu_list(); > >> > >> We should've stayed out of the GUI business. > > > > This isn't only a GUI problem as above, it affects USB MTP. > > I believe setlocale() in QEMU is basically wrong. Finding all the > places that rely on the current locale when they shouldn't and > converting them to locale-independent alternatives is a huge amount of > work. Even if we managed to complete it, it wouldn't stay complete. > > Instead, find the places that have reason to use the locale, and fix > them to uselocale(). I think that's fundamentally the wrong way around. Most stuff *should* be locale dependant, otherwise any interaction with the host OS is likely to use incorrect localization. It isn't practical to put a uselocale() call around every place that opens a filename. There are a few places where QEMU should be locale indepandant such as the QMP and guest OS ABI sensitive things, which should take account of it. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|