Éric Araujo <mer...@netwok.org> added the comment: [Toshio, I made you nosy for a question about RPM .spec files]
>> - PKG-INFO (METADATA in distutil2), that already uses a trick to support >> Unicode, but your change would replace it in a better way; > Which "trick"? Some values are explicitly allowed to use Unicode and are encoded to UTF-8 when queried. >> - MANIFEST, which with your fix would gain the ability to handle non-ASCII >> paths, which is a feature or a bugfix depending on your point of view; > Wait. Non encodable bytes is a separated issue. I would like to work on the > first problem: distutils in Python3 uses open() without encoding argument and > so the encoding depends on the user's locale. Said differently: if you produce > a file with distutils on a computer, you cannot be sure that the file can be > read with the same version of Python on other computer (if the locale encoding > is different). Eg. Windows uses mbcs encoding whereas utf-8 is the preferred > encoding on Linux. > > What is the encoding of the MANIFEST file? Python’s default encoding, unfortunately. Try listing “napoléon” in a MANIFEST file and you’ll get a UnicodeEncodeError because the file wants ASCII. >> - .def files, used by the compilers for the C linking step; I don’t know if >> it’s appropriate to allow UTF-8 there. > > I don't know these files. So we’ll have to get advice from someone well-versed in C linking. >> - RPM spec files, which use ASCII or UTF-8 according to >> http://en.opensuse.org/openSUSE:Specfile_guidelines#Specfile_Encoding but >> it’s not confirmed in >> http://www.rpm.org/max-rpm/s1-rpm-build-creating-spec-file.html (linked >> from the LSB site), so there’s no guarantee this works for all RPM >> platforms. This sort of platform-specific thing is the reason why RPM >> support has been removed in distutils2. > UTF-8 is a superset of ASCII. If you use utf-8 but only write ascii > characters, your output file will be written to utf-8... but it will be also > encoded to ascii. It's magical :-) I know that, but it does not answer the question: Is it okay for these files to use UTF-8? >> - record and .pth files created by the install command. > .pth contain directory names which can be non-ASCII. Agreed. >> I agree that there is something to be fixed, but I don’t know if they can >> be fixed in distutils. Unicode in PKG-INFO is unrelated to files, whereas >> there are files or directories in MANIFEST, spec, record and .pth. > You can use non-ASCII characters for other topics than filenames. Eg. in a > description of a package :-) See above: The description of a distribution is in UTF-8. Note that I don’t really understand my comment anymore; I now think that this should be fixed in distutils with the least intrusive change possible. >> If this is going to be fixed, write_file should not use UTF-8 unconditionally >> but grow a keyword argument IMO, so that use cases requiring ASCII >> continue to work. > As written before, UTF-8 is a superset of ASCII. If you read a file using > utf-8 > encoding, you will be able to read ascii files. But if you use utf-8 and write > non-ascii characters, old version of distutils using ascii or other encoding > will not be able to read these files. That’s what I meant: Don’t make write_file always use UTF-8 since some use cases are restricted to ASCII. > About the keyword solution: yes, it would be a smooth way to fix this issue. Let’s do it. (Make sys.getdefaultencoding() its default value for compat.) >> When you say “patch *all* functions reading files”, I guess you mean all >> functions that read distutils files, i.e. MANIFEST and PKG-INFO. > I don't know distutils to answer to my own question. You patch writing files, I’ll handle reading files :) ---------- nosy: +a.badger _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue9561> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com