Eryk Sun <eryk...@gmail.com> added the comment:

> I've found a catch via ProcessHacker: CreateFile() with 
> GENERIC_WRITE (or FILE_GENERIC_WRITE) additionally grants 
> FILE_READ_ATTRIBUTES for some reason. 

CreateFileW always requests at least SYNCHRONIZE and FILE_READ_ATTRIBUTES 
access.

The I/O manager requires synchronize access if a file is opened in synchronous 
mode. CreateFileW goes a step further. It always requests synchronize access, 
even with asynchronous mode (overlapped). The File object gest signaled when an 
I/O request completes, but it's not very useful in the context of overlapping 
requests.

Requesting read-attributes access supports API functions that query certain 
file information. Here are some of the more common queries that require 
read-attributes access:

    FileBasicInformation (GetFileInformationByHandleEx, GetFileTime)
    FileAllInformation (GetFileInformationByHandle)
    FileAttributeTagInformation (GetFileInformationByHandleEx)

Thus os.fstat(fd) can succeed even if the file is opened in O_WRONLY mode.

CreateFileW also implicitly requests DELETE access if FILE_FLAG_DELETE_ON_CLOSE 
is used, instead of letting the call fail with an invalid-parameter error if 
delete access isn't requested. This behavior isn't documented.

> undoing the side effect applies to O_CREAT and O_TRUNC too: we can create 
> and/or 
> truncate the file, but then fail. 

I think truncation via TRUNCATE_EXISTING (O_TRUNC, with O_WRONLY or O_RDWR) or 
overwriting with CREATE_ALWAYS (O_CREAT | O_TRUNC) is at least tolerable 
because the caller doesn't care about the existing data. When overwriting, the 
caller also wants to remove any alternate data streams and extended attributes 
in the file. Nothing important is lost. Also, since both cases retain the 
original file's security descriptor, at least failure after truncation or 
overwriting isn't a security hole.

Unless we require CREATE_NEW (O_CREAT | O_EXCL) whenever O_TEMPORARY is used 
(i.e. as the tempfile module uses it), there is a potential for an existing 
file to be deleted if all handles are closed on failure, as discussed 
previously. This is unacceptable not only because of potential unrecoverable 
data loss, but also because the security descriptor is lost. 

With OPEN_ALWAYS (O_CREAT), CREATE_ALWAYS or CREATE_NEW, there's the chance of 
leaving behind a new empty file or alternate data stream on failure, which is a 
problem, but at least nothing is lost.

> _open_osfhandle() can still fail with EMFILE. 

The CRT supports 8192 open file descriptors (128 arrays of 64 fds), so failing 
with EMFILE should be rare, in extreme cases. There's also a remote possibility 
of memory corruption that causes __acrt_lowio_set_os_handle() to fail with 
EBADF because the fd value is negative, or its handle value isn't the default 
INVALID_HANDLE_VALUE, or the CRT _nhandle count is corrupt. These aren't 
practical concerns, just as DuplicateHandle() failing isn't a practical 
concern, but failure should be handled conservatively.

> the same issue would apply even in case of direct implementation of 
> os.open()/open() via CreateFile() 

Migrating to CreateFileW() might need to be shelved until Python uses native OS 
File handles instead of CRT file descriptors. The remaining reliance on the CRT 
low I/O layer ties our hands for now.

> Truncation can simply be deferred until we have the fd and then performed 
> manually.

What if it fails after overwriting an existing file? Manually overwriting only 
after getting the new fd is complicated. To match CREATE_ALWAYS (O_CREAT | 
O_TRUNC), before overwriting it would have to query the existing file 
attributes and fail the call if FILE_ATTRIBUTE_HIDDEN or FILE_ATTRIBUTE_SYSTEM 
is set. If the file itself has to be overwritten (i.e. the default, anonymous 
data stream), as opposed to a named data stream, it would have to delete all 
named data streams and extended attributes in the file. Normally that's all 
implemented atomically in the filesystem. 

In contrast, TRUNCATE_EXISTING (O_TRUNC) is simple to emulate, since 
CreateFileW implents it non-atomically with a subsequent NtSetInformationFile: 
FileAllocationInformation system call. 

> But I still don't know how to deal with O_TEMPORARY, unless there is a 
> way to unset FILE_DELETE_ON_CLOSE on a handle.

For now, that's possible with NTFS and the Windows API in all supported 
versions of Windows by using a second kernel File with DELETE access, which is 
opened before the last handle to the first kernel File is closed. After you 
close the first open, use the second one to call SetFileInformation: 
FileDispositionInfo to undelete the file. That said, if NTFS changes the 
default for delete-on-close to use a POSIX-style delete (immediate unlink), it 
won't be possible to 'undelete' the file.

Windows 10 supports additional flags with FileDispositionInfoEx (21), or NTAPI 
FileDispositionInformationEx [1]. This provides a better way to disable or 
modify the delete-on-close state per kernel File object, if the filesystem 
supports it. If FILE_DISPOSITION_ON_CLOSE (8) is set with 
FILE_DISPOSITION_DO_NOT_DELETE (0), the on-close disposition will be disabled. 
It is not possible, as far as I know, to enable it again. For example:

    >>> fd = os.open('spam.txt', os.O_TEMPORARY|os.O_CREAT)
    >>> h = msvcrt.get_osfhandle(fd)
    >>> info = ctypes.c_ulong(8)
    >>> kernel32.SetFileInformationByHandle(h, 21, ctypes.byref(info), 
ctypes.sizeof(info))
    1
    >>> os.close(fd)
    >>> os.path.exists('spam.txt')
    True

If FILE_DISPOSITION_ON_CLOSE is set with FILE_DISPOSITION_DELETE (1) and 
FILE_DISPOSITION_POSIX_SEMANTICS (2), the delete-on-close behavior is changed 
to use POSIX semantics, which immediately unlinks the file even if there are 
existing opens. For example:

    >>> fd = os.open('spam.txt', os.O_TEMPORARY|os.O_CREAT)
    >>> h = msvcrt.get_osfhandle(fd)
    >>> info = ctypes.c_ulong(8|2|1)
    >>> kernel32.SetFileInformationByHandle(h, 21, ctypes.byref(info), 
ctypes.sizeof(info))
    1

Add a second open:

    >>> fd2 = os.open('spam.txt', os.O_TEMPORARY)

Normally the second open would keep the file linked in the directory after it's 
'deleted', but not with POSIX semantics:

    >>> os.close(fd)
    >>> os.path.exists('spam.txt')
    False
    >>> 'spam.txt' in os.listdir('.')
    False

---

[1] 
https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntddk/ns-ntddk-_file_disposition_information_ex

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue42606>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to