Eryk Sun <eryk...@gmail.com> added the comment:

Here's a WebDAV example:

    net use Z: \\live.sysinternals.com\tools

Both st_dev (volume serial number) and st_ino (file number) are 0 for all files 
on this drive. _getfinalpathname also fails since WebDAV doesn't support the 
default FILE_NAME_NORMALIZED flag (i.e. replace 8.3 short names in the path 
with normal names). We need realpath(), which will reasonably handle cases 
where _getfinalpathname fails. See issue 14094.

samefile() has to go back into ntpath.py in Windows. The generic implementation 
relies on the POSIX guarantee that the tuple (st_dev, st_ino) uniquely 
identifies a file, which Windows doesn't provide.

I suggest the following for samefile(). If either st_ino or st_dev is 
different, return False. If they're equal and non-zero, return True. Otherwise 
compare the final paths. If st_ino is zero, compare the entire paths. If st_ino 
is non-zero, then compare only the drives. (This supports the unusual case of 
hardlinks on a volume that has no serial number.) The final paths can come from 
realpath() if issue 14094 is resolved. For example:

    def samefile(fn1, fn2):
        """Test whether two file names reference the same file"""
        s1 = os.stat(fn1)
        s2 = os.stat(fn2)
        if s1.st_ino != s2.st_ino or s1.st_dev != s2.st_dev:
            return False
        if s1.st_ino and s1.st_dev:
            return True
        rp1, rp2 = realpath(fn1), realpath(fn2)
        if s1.st_ino:
            return splitdrive(rp1)[0] == splitdrive(rp2)[0]
        return rp1 == rp2

For sameopenfile(), it's trivial to extend _getfinalpathname to support the 
argument as a file-descriptor, in which case it simply calls _get_osfhandle to 
get the handle instead of opening the file via CreateFileW. This loses the 
flexibility of realpath(), but below I propose extending the range of paths 
supported by _getfinalpathname.

---

Note that the root directory in FAT32 is file number 0. For NTFS, the file 
number is never 0. The high word of the 64-bit file reference number is a 
sequence number that begins at 1. 

For local drives the volume serial number shouldn't be zero. It's possible to 
manually change it to zero, but that's intentional mischief. There's a small 
chance that two drives have the same serial number, and an even smaller chance 
that we get an (st_dev, st_ino) match that's a false positive. I'm not happy 
with that, however small the probability, but I don't know a simple way to 
address the problem.

For local storage, Windows does have a device number that's similar to POSIX 
st_dev. It's actually three numbers -- device type (16-bit), device number 
(32-bit), and partition number (32-bit) -- that taken together constitute an 
80-bit ID. The problem is that we have to query IOCTL_STORAGE_GET_DEVICE_NUMBER 
directly using a handle for the volume device. Getting a handle for the volume 
can be expensive since we may be starting from a file handle or have a volume 
that's mounted as a filesystem junction. Plus this lacks generality since it's 
not implemented by MUP (Multiple UNC Provider, the proxy device for UNC 
providers) -- not even for a local SMB share such as "\\localhost\C$". 

To improve reliability for corner cases, _getfinalpathname could be extended to 
try all path types, with and without normalization. Start with the DOS name. 
Next try the GUID name (i.e. a device that supports the mountpoint manager but 
isn't auto-mounted as a DOS drive or file-system junction) and finally the NT 
name (i.e. a device that doesn't support the mountpoint manager, such as an 
ImDisk virtual disk, the named-pipe file system, or mailslot file system). For 
an NT path, _getfinalpathname can try to manually resolve a normal mountpoint 
via QueryDosDeviceW, but limit this to just drive-letter names and well-known 
global names from "HKLM\System\CurrentControlSet\Control\Session Manager\DOS 
Devices" (e.g. PIPE, MAILSLOT). If there's no normal mountpoint, prefix the NT 
path with the "\\?\GLOBALROOT" link. For example a file named "\spam" on the 
devices "\Device\ImDisk0" (a ramdisk, say it's mounted as R:), 
"\Device\NamedPipe", and "\Device\NtOnly" would resolve to "\\?\R:\spam", "\\
 ?\PIPE\spam", and "\\?\GLOBALROOT\Device\NtOnly\spam".

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33935>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to