Am Samstag, 18. Juni 2016 um 12:08:23, schrieb Georg Baum <georg.b...@post.rwth-aachen.de> > Kornel Benko wrote: > > > Setting 'wrong' lang environment causes lyx to use different encoding for > > filenames. > > > > Setting > > export LANG="en_IE@euro" > > > > Now, reading the file "Testoübernahme.lyx" which needs conversion leads to > > this log snippet: > > > > support/TempFile.cpp (35): Temporary file in > > > /home/kornel/.lyx2/tmp/lyx_tmpdir.dkXWbiwl8040/Buffer_convertLyXFormatXXXXXX.lyx > > support/TempFile.cpp (38): Temporary file > > > `/home/kornel/.lyx2/tmp/lyx_tmpdir.dkXWbiwl8040/Buffer_convertLyXFormatAS8040.lyx' > > created. Buffer.cpp (1297): Running 'python -tt > > "/usr/local/share/lyx2.3/lyx2lyx/lyx2lyx" -t 509 -o > > > "/home/kornel/.lyx2/tmp/lyx_tmpdir.dkXWbiwl8040/Buffer_convertLyXFormatAS8040.lyx" > > "/usr2/kornel/lyx/privat/Briefe-Edgar/Testoübernahme.lyx"' usage: lyx2lyx > > [options] [file] lyx2lyx: error: argument input: invalid cmd_arg value: > > '/usr2/kornel/lyx/privat/Briefe-Edgar/Testo\xc3\xbcbernahme.lyx' > > > > Everything is OK, if using e.g. LANG="en_IE.utf8". > > > > From my POV, encoding of file-names should not depend on locales. > > TL;DR: The current behaviour is probably correct, or QFile::encodeName() has > a bug. > > Unfortunately this is complicated, but I'll try to explain. First let's have > a look how file names are stored in the file system. This depends of course > on the file system type. Both NTFS on windows and HFS+ on OS X store file > names encoded in utf-16 (see https://en.wikipedia.org/wiki/NTFS and > https://en.wikipedia.org/wiki/HFS_Plus). This is simple and reliable, any > program or operating system that deals with the file system directly (e.g. > when mounting it on a different machine), knows how to interpret file names > and can present them to the user in the correct way. > > For other file systems such as FAT or the typical linux file systems (e.g. > ext3) the situation is a mess. ext3 and relatives do not specify in which > encoding a file name is stored. They only know bytes (see e.g. > http://unix.stackexchange.com/questions/39175/understanding-unix-file-name-encoding). > The interpretation of the bytes is left to the user space, and > here comes the locale into account: I the locale is set to en_IE@euro, and > you create a file, the encoding of the file name will be iso_8859-15. If you > do the same while the locale is set to en_IE, the encoding of the file name > will be utf8. This used to cause big trouble in the transition period from > fixed width 8bit locales to utf8, when people hand file names with non-ascii > letters, and used the old hard disk on a machine with a newer Linux, and > suddenly all file names looked broken. Therefore utilities like convmv were > invented, and when mounting FAT file systems on linux the codepage= and > iocharset= options can be used. > > What happens in your case is the following: LyX does _not_ use the > iso_8859-15 encoding when calling lyx2lyx. This can be seen from the error > message, if it would use iso_8859-15 then the ü would not be encoded in two > bytes. Here we might have a bug in QFile::encodeName() that is used > internally, but I rather suspect that you still have some LC_* variables set > to use an utf8-encoding. Unfortunately the qt documentation is rather > unspecific about how exactly the "local 8-bit encoding determined by the > user's locale" (which is used by QFile::encodeName()) is determined, one > would have to read the sources. > > Assuming that LyX would really pass the file name encoded in iso_8859-15 to > lyx2lyx, then the commandline argument decoding in lyx2lyx would work (I did > spend some evenings to understand how this works and to implement the > current parsing interface in lyx2lyx). However, when lyx2lyx would try to > read the input file it would not work. The reason for this is that your > original file was created with an active utf8 locale, but the current locale > tells lyx2lyx to use iso_8859-15 for decoding the file name. It would work > if you called convmv to convert the file name in the file system to > iso_8859-15 before starting LyX. > > Encoding commandline arguments of programs according to the currently active > locale is standard among all operating systems (see e.g. > http://stackoverflow.com/questions/5408730/what-is-the-encoding-of-argv). So > for the case that the user calls lyx2lyx directly in a terminal, or from a > different program than LyX, the current lyx2lyx behaviour is correct (I > tested that using different encodings). If you want to test this as well you > need to ensure that you set all environment variables that are currently set > to the wanted locale. These may be LANG, LANGUAGE and LC_*. When using a > terminal emulator from X, you also need to change the encoding of the > terminal emulator, because this determines how the keyboard input that is > fed to the shell is encoded. > > If called from LyX we could simply decide to use utf8 for lyx2lyx > commandline arguments. Of course this would have to be specified by a > special commandline parameter, so that non-LyX usage of lyx2lyx does not > break. I do not see any real advantage when doing this. We would not need > the ugly FileName::toSafeFilesystemEncoding() on windows, and we would be > able to encode every file for the lyx2lyx commandline, but on linux, if the > file name is not encodable by the current locale, lyx2lyx would fail when > trying to open the file. > > > Georg
Thanks for clarification. Nonetheless, we have a mess here. 1.) Reading .lyx without need to convert (e.g. in current lyx-format) works regardless of environment (This is done by lyx directly, without interpreting the file-name) 2.) Reading .lyx with lyx2lyx does not work. So my question: Why not pass the filename to lyx2lyx (or any other program) without _any_ interpretation? Kornel
signature.asc
Description: This is a digitally signed message part.