Eryk Sun <eryk...@gmail.com> added the comment:

Python 3.8 introduced some behavior changes to how reparse points are 
supported, but generalized support for handling name-surrogate reparse points 
as symlinks was not implemented. Python continues to set S_IFLNK in st_mode 
only for IO_REPARSE_TAG_SYMLINK reparse points. This ensures that if 
os.path.islink() is true, the link can be read and copied exactly via 
os.readlink() and os.symlink(). Otherwise, islink() could be true but 
readlink() will fail or symlink() will be used to mistakenly copy a mountpoint 
as a symlink. 

A mountpoint is not equivalent to a symlink in a few cases. The major 
difference is that mountpoints are evaluated on the server side in a remote 
path, targeting devices on the server, whereas symlinks are evaluated on the 
client side, targeting devices on the client (e.g. its "C:" drive) and are 
subject to the client system's L2R (local to remote), L2L, R2L, and R2R symlink 
policy. Replacing a mountpoint with a symlink means that, at best, the path 
will no longer work when accessed remotely, and at worst the client will allow 
resolving the target locally to something that's dangerously wrong.

Another difference is how the kernel handles mounpoints when opening a path. 
The target of a mountpoint does not replace the previously traversed path 
components in the opened path, whereas the target path of a symlink does 
replace the opened path. The previously traversed path matters when the kernel 
resolves ".." components in the target of a relative symlink. For example, a 
relative symlink that traverses up the tree with ".." components may have been 
tested on a traversed directory, which worked fine. Then later the directory 
was replaced with a mountpoint (junction) for compatibility, which continued to 
work fine. But after a CopyTree() that naively replaces the mountpoint with a 
symlink, the copied relative symlink is either broken, or worse, it resolves to 
a target that's dangerously wrong.

A generalization of the readlink() and symlink() combination could be 
implemented to copy any type of name-surrogate reparse point. If Python had 
something like that, then it could reasonably support any name-surrogate 
reparse point as a "symlink". That's not without problems, considering the 
behavior isn't the same and APIs and other applications may only support 
IO_REPARSE_TAG_SYMLINK in various cases, but sometimes perfect is the enemy of 
good.

That said, os.walk() can still special case mountpoints and other 
name-surrogate reparse points. To support cases like this, the lstat() result 
was extended to include the st_reparse_tag value of name-surrogate reparse 
points. The stat module has the IO_REPARSE_TAG_SYMLINK and 
IO_REPARSE_TAG_MOUNT_POINT constants. A simple function that checks for a 
name-surrogate reparse point could be added as well -- i.e. bool(reparse_tag & 
0x20000000).

---

Using st_reparse_tag to abstract checking the file type is awkward. I wanted to 
support a keyword-only parameter in Windows to expand the 'symlink' domain to 
include all name-surrogate reparse points. This parameter would have been added 
to os.[l]stat(), DirEntry.stat(), DirEntry.is_dir(), and DirEntry.is_file(), as 
well as os.path.islink() and DirEntry.is_symlink(). By default only 
IO_REPARSE_TAG_SYMLINK would have been handled as a symlink. But this idea 
wasn't accepted. Instead, custom checks have to be implemented whenever a 
problem needs the expanded 'symlink' domain.

----------
versions: +Python 3.10, Python 3.8, Python 3.9 -Python 3.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue23407>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to