Nathaniel Smith <n...@pobox.com> added the comment:

> If you wanted to keep async disk access separate from the io module, then 
> what we'd have to do is to create a fork of all the code in the io module, 
> and add this feature to it.

Thinking about this again today, I realized there *might* be another option.

The tricky thing about supporting async file I/O is that users want the whole 
io module interface, and we don't want to have to reimplement all the 
functionality in TextIOWrapper, BufferedReader, BufferedWriter, etc. And we 
still need the blocking functionality too, for when we fall back to threads.

But, here's a possible hack. We could implement our own version of 'FileIO' 
that wraps around a real FileIO. Every operation just delegates to the 
underlying FileIO – but with a twist. Something like:

def wrapped_op(self, *args):
    if self._cached_op.key == (op, args):
        return self._cached_op.result
    if MAGIC_THREAD_LOCAL.io_is_forbidden:
        def cache_filler():
            MAGIC_THREAD_LOCAL.io_is_forbidden = False
            self._cached_op = self._real_file.op(*args)
        raise IOForbiddenError(cache_filler)
    return self._real_file.op(*args)

And then in order to implement an async operation, we do something like:

async def op(self, *args):
    while True:
        try:
            # First try fulfilling the operation from cache
            MAGIC_THREAD_LOCAL.io_is_forbidden = True
            return self._io_obj.op(*args)
        except IOForbiddenError as exc:
            # We have to actually hit the disk
            # Run the real IO operation in a thread, then try again
            await in_thread(cache_filler)
        finally:
            del MAGIC_THREAD_LOCAL.io_is_forbidden

This is pretty convoluted: we keep trying the operation on the outer "buffered" 
object, seeing which low-level I/O operation it gets stuck on, doing that I/O 
operation, and trying again. There's all kinds of tricky non-local state here; 
like for example, there isn't any formal guarantee that the next time we try 
the "outer" I/O operation it will end up making exactly the same request to the 
"inner" RawIO object. If you try performing I/O operations on the same file 
from multiple tasks concurrently then you'll get all kinds of havoc. But if it 
works, then it does have two advantages:

First, it doesn't require changes to the io module, which is at least nice for 
experimentation.

And second, it's potentially compatible with the io_uring style of async disk 
I/O API. I don't actually know if this matters; if you look at the io_uring 
docs, the only reason they say they're more efficient than a thread pool is 
that they can do the equivalent of preadv(RWF_NOWAIT), and it's a lot easier to 
add preadv(RWF_NOWAIT) to a thread pool than it is to implement io_uring. But 
still, this approach is potentially more flexible than my original idea.

We'd still have to reimplement open() in order to set up our weird custom IO 
stacks, but hopefully that's not *too* bad, since it's mostly just a bunch of 
if statements to decide which wrappers to stick around the raw IO object.

My big concern is, that I'm not actually sure if this works :-).

The thing is, for this to work, we need 
TextIOWrapper/BufferedReader/BufferedWriter to be very well-behaved when the 
underlying operation raises an exception. In particular, if they're doing a 
complex operation that requires multiple calls to the underlying object, and 
the second call raises an exception, they need to keep the first call's results 
in their buffer so that next time they can pick up where they left off. And I 
have no idea if that's true.

I guess if you squint this is kind of like the non-blocking support in the io 
module – IOForbiddenError is like NonBlockingError. The big difference is that 
here, we don't have any "partial success" state at the low-level; either we do 
the operation immediately, or we punt and do the operation in a thread. Either 
way it completes as a single indivisible unit. So that might simplify things? 
From a quick skim of issue13322 it sounds like a lot of the problems with the 
current "non-blocking" mode come from these partial success states, but I 
haven't read it in detail.

----------
title: Add API to io objects for non-blocking reads/writes -> Add API to io 
objects for cache-only reads/writes

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue32561>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to