On 2018-08-09 01:14, Thomas Jollans wrote:
On 09/08/18 01:48, MRAB wrote:
On 2018-08-08 23:16, Thomas Jollans wrote:
On *nix, file names are bytes. In real life, we prefer to think of file
names as strings. How non-ASCII file names are created is determined by
the locale, and on most systems these days, every locale uses UTF-8 and
everybody's happy. Of course this doesn't mean you'll never run into and
old directory tree from the pre-UTF8 age using some other encoding, and
it doesn't prevent people from doing silly things in file names.
Python deals with this tolerably well: by convention, file names are
strings, but you can use bytes for file names if you wish. The docs [1]
warn you about the situation.
[1] https://docs.python.org/3/library/os.path.html
If Python runs into a non-UTF8 (better: non-decodable) file name and has
to return a str, it uses surrogate escape codes. So far so good. Right?
This leads to the unfortunate situation that you can't always print()
file names, as print() is strict and refuses to toy with surrogates.
To be more explicit, the script
print(__file__)
will fail depending on the file name. This feels wrong... (though every
bit of behaviour is correct)
(The situation can't arise on Windows, and Python 2 will pretend nothing
happened in true UNIX style)
Demo script to try at home below.
[snip]
Is it true that Unix filenames can contain control characters, e.g. \x07?
When happens when you print them out?
I think it's not just a problem with surrogate escapes.
Not a problem (or: not an exception), as those are ASCII and thus UTF-8.
Python 3.6.5 (default, Apr 1 2018, 05:46:30)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
with open('\x07.py', 'w') as fp:
... fp.write('print(__file__)\n')
...
16
import sys; import subprocess
subprocess.call([sys.executable, '\x07.py'])
.py
0
As you might expect, it beeped when printing '\x07.py' (and showed .py)
And that's OK, is it? :-)
--
https://mail.python.org/mailman/listinfo/python-list