On Tue, 05 Jun 2018 17:28:24 +0200, Peter J. Holzer wrote: [...] > If a disk with a file system which allows embedded NUL characters is > mounted on Linux (let's for the sake of the argument assume it is HFS+, > although I have to admit that I don't know anything about the internals > of that filesystem), then the low level filesystem code has to map that > character to something else. Even the generic filesystem code of the > kernel will never see that NUL character,
Even if this were true, why is it even the tiniest bit relevant to what os.path.exists() does when given a path containing a NUL byte? > let alone the user space. As > far as the OS is concerned, that file doesn't contain a NUL character. I don't care about "as far as the OS". I care about users, people like me. If I say "Here's a file called "sp\0am" then I don't care what the OS does, or the FS driver, or the disk hardware. I couldn't care less what the actual byte pattern on the disk is. If you told me that the pattern of bytes representing that filename was 0x0102030405 then I'd be momentarily impressed by the curious pattern and then do my best to immediately forget all about it. As a Python programmer, *why do you care* about NULs? How does this special treatment make your life as a Python programmer better? > The whole system (except for some low-level FS-dependent code) will > always only see the mapped name. Yes. So what? That's *already the case*. Even Python string you pass to os.path.exists is already mapped, and errors from the kernel are mapped to False. Why should NUL be treated differently? Typical Linux file systems (ext3, ext4, btrfs, ReiserFS etc) don't support Unicode, only bytes 0...255, but we can query "invalid" file names containing characters like δ ж or ∆, without any problem. We don't get ValueError just because of some irrelevant technical detail that the file system doesn't support characters outside of the range of bytes 1...255 (excluding 47). We can do this because Python seamlessly maps Unicode to bytes and back again. You may have heard of a little-known operating system called "Windows", which defaults to NTFS as its file system. I'm told that there are a few people who use this file system. Even under Linux, you might have (knowingly or unknowingly) used a network file system or storage device that used NTFS under the hood. If so, then every time you query a filename, even an ordinary looking one like "foo", you could be dealing with multiple NUL bytes, as the NTFS file system (even under Linux!) uses Unicode file names encoded with UTF-16. There's a good chance that EVERY filename you've used on a NAS device or network drive has included embedded NUL bytes. You've painted a pretty picture of the supposed confusion and difficulty such NUL bytes would cause, but its all nonsense. We already can seamlessly and transparently interact with file systems where file names include NUL bytes under Linux. BUT even if what you said was true, that Linux cannot deal with NUL bytes in file names even with driver support, even if passing a NUL byte to the Linux kernel would cause the fall of human civilization, that STILL wouldn't require us to raise ValueError from os.path.exists! -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list