New submission from Nathaniel Smith <n...@pobox.com>:

Background: Doing I/O to files on disk has a hugely bimodal latency. If the I/O 
happens to be in or going to cache (either user-space cache, like in 
io.BufferedIOBase, or the OS's page cache), then the operation returns 
instantly (~1 µs) without blocking. OTOH if the I/O isn't cached (for reads) or 
cacheable (for writes), then the operation may block for 10 ms or more.

This creates a problem for async programs that want to do disk I/O. You have to 
use a thread pool for reads/writes, because sometimes they block for a long 
time, and you want to let your event loop keep doing other useful work while 
it's waiting. But dispatching to a thread pool adds a lot of overhead (~100 
µs), so you'd really rather not do it for operations that can be serviced 
directly through cache. For uncached operations a thread gives a 100x speedup, 
but for cached operations it's a 100x slowdown, and -- this is the kicker -- 
there's no way to predict which ahead of time.

But, io.BufferedIOBase at least knows when it can satisfy a request directly 
from its buffer without issuing any syscalls. And in Linux 4.14, it's even 
possible to issue a non-blocking read to the kernel that will only succeed if 
the data is immediately available in page cache (bpo-31368).

So, it would be very nice if there were some way to ask a Python file object to 
do a "nonblocking read/write", which either succeeds immediately or else raises 
an error. The intended usage pattern would be:

async def read(self, *args):
    try:
        self._fileobj.read(*args, nonblock=True)
    except BlockingIOError: # maybe?
        return await run_in_worker_thread(self._fileobj.read, *args)

It would *really* help for this to be in the Python core, because right now the 
convenient way to do non-blocking disk I/O is to re-use the existing Python I/O 
stack, with worker threads. (This is how both aiofiles and trio's async file 
support work. I think maybe curio's too.) But to implement this feature 
ourselves, we'd have to first reimplement the whole I/O stack, because the 
important caching information, and choice of what syscall to use, are hidden 
inside.

----------
components: IO
messages: 310032
nosy: benjamin.peterson, njs, stutzbach
priority: normal
severity: normal
status: open
title: Add API to io objects for non-blocking reads/writes
versions: Python 3.8

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue32561>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to