Am Samstag, 18. Juni 2016 um 12:08:23, schrieb Georg Baum 
<georg.b...@post.rwth-aachen.de>
> Kornel Benko wrote:
> 
> > Setting 'wrong' lang environment causes lyx to use different encoding for
> > filenames.
> > 
> > Setting
> > export LANG="en_IE@euro"
> > 
> > Now, reading the file "Testoübernahme.lyx" which needs conversion leads to
> > this log snippet:
> > 
> > support/TempFile.cpp (35): Temporary file in
> > 
> /home/kornel/.lyx2/tmp/lyx_tmpdir.dkXWbiwl8040/Buffer_convertLyXFormatXXXXXX.lyx
> > support/TempFile.cpp (38): Temporary file
> > 
> `/home/kornel/.lyx2/tmp/lyx_tmpdir.dkXWbiwl8040/Buffer_convertLyXFormatAS8040.lyx'
> > created. Buffer.cpp (1297): Running 'python -tt
> > "/usr/local/share/lyx2.3/lyx2lyx/lyx2lyx" -t 509 -o
> > 
> "/home/kornel/.lyx2/tmp/lyx_tmpdir.dkXWbiwl8040/Buffer_convertLyXFormatAS8040.lyx"
> > "/usr2/kornel/lyx/privat/Briefe-Edgar/Testoübernahme.lyx"' usage: lyx2lyx
> > [options] [file] lyx2lyx: error: argument input: invalid cmd_arg value:
> > '/usr2/kornel/lyx/privat/Briefe-Edgar/Testo\xc3\xbcbernahme.lyx'
> > 
> > Everything is OK, if using e.g. LANG="en_IE.utf8".
> > 
> > From my POV, encoding of file-names should not depend on locales.
> 
> TL;DR: The current behaviour is probably correct, or QFile::encodeName() has 
> a bug.
> 
> Unfortunately this is complicated, but I'll try to explain. First let's have 
> a look how file names are stored in the file system. This depends of course 
> on the file system type. Both NTFS on windows and HFS+ on OS X store file 
> names encoded in utf-16 (see https://en.wikipedia.org/wiki/NTFS and 
> https://en.wikipedia.org/wiki/HFS_Plus). This is simple and reliable, any 
> program or operating system that deals with the file system directly (e.g. 
> when mounting it on a different machine), knows how to interpret file names 
> and can present them to the user in the correct way.
> 
> For other file systems such as FAT or the typical linux file systems (e.g. 
> ext3) the situation is a mess. ext3 and relatives do not specify in which 
> encoding a file name is stored. They only know bytes (see e.g. 
> http://unix.stackexchange.com/questions/39175/understanding-unix-file-name-encoding).
>  The interpretation of the bytes is left to the user space, and 
> here comes the locale into account: I the locale is set to en_IE@euro, and 
> you create a file, the encoding of the file name will be iso_8859-15. If you 
> do the same while the locale is set to en_IE, the encoding of the file name 
> will be utf8. This used to cause big trouble in the transition period from 
> fixed width 8bit locales to utf8, when people hand file names with non-ascii 
> letters, and used the old hard disk on a machine with a newer Linux, and 
> suddenly all file names looked broken. Therefore utilities like convmv were 
> invented, and when mounting FAT file systems on linux the codepage= and 
> iocharset= options can be used.
> 
> What happens in your case is the following: LyX does _not_ use the 
> iso_8859-15 encoding when calling lyx2lyx. This can be seen from the error 
> message, if it would use iso_8859-15 then the ü would not be encoded in two 
> bytes. Here we might have a bug in QFile::encodeName() that is used 
> internally, but I rather suspect that you still have some LC_* variables set 
> to use an utf8-encoding. Unfortunately the qt documentation is rather 
> unspecific about how exactly the "local 8-bit encoding determined by the 
> user's locale" (which is used by QFile::encodeName()) is determined, one 
> would have to read the sources.
> 
> Assuming that LyX would really pass the file name encoded in iso_8859-15 to 
> lyx2lyx, then the commandline argument decoding in lyx2lyx would work (I did 
> spend some evenings to understand how this works and to implement the 
> current parsing interface in lyx2lyx). However, when lyx2lyx would try to 
> read the input file it would not work. The reason for this is that your 
> original file was created with an active utf8 locale, but the current locale 
> tells lyx2lyx to use iso_8859-15 for decoding the file name. It would work 
> if you called convmv to convert the file name in the file system to 
> iso_8859-15 before starting LyX.
> 
> Encoding commandline arguments of programs according to the currently active 
> locale is standard among all operating systems (see e.g. 
> http://stackoverflow.com/questions/5408730/what-is-the-encoding-of-argv). So 
> for the case that the user calls lyx2lyx directly in a terminal, or from a 
> different program than LyX, the current lyx2lyx behaviour is correct (I 
> tested that using different encodings). If you want to test this as well you 
> need to ensure that you set all environment variables that are currently set 
> to the wanted locale. These may be LANG, LANGUAGE and LC_*. When using a 
> terminal emulator from X, you also need to change the encoding of the 
> terminal emulator, because this determines how the keyboard input that is 
> fed to the shell is encoded.
> 
> If called from LyX we could simply decide to use utf8 for lyx2lyx 
> commandline arguments. Of course this would have to be specified by a 
> special commandline parameter, so that non-LyX usage of lyx2lyx does not 
> break. I do not see any real advantage when doing this. We would not need 
> the ugly FileName::toSafeFilesystemEncoding() on windows, and we would be 
> able to encode every file for the lyx2lyx commandline, but on linux, if the 
> file name is not encodable by the current locale, lyx2lyx would fail when 
> trying to open the file.
> 
> 
> Georg

Thanks for clarification. Nonetheless, we have a mess here.
1.) Reading .lyx without need to convert (e.g. in current lyx-format) works 
regardless of environment
        (This is done by lyx directly, without interpreting the file-name)
2.) Reading .lyx with lyx2lyx does not work.
So my question: Why not pass the filename to lyx2lyx (or any other program) 
without _any_ interpretation?

        Kornel

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to