Hi Yury, This is really cool. Some notes on a first read:
1. Excellent work on optimizing dict, that seems valuable independent of the rest of the details here. 2. The text doesn't mention async generators at all. I assume they also have an agi_isolated_execution_context flag that can be set, to enable @asyncontextmanager? 2a. Speaking of which I wonder if it's possible for async_generator to emulate this flag... I don't know if this matters -- at this point the main reason to use async_generator is for code that wants to support PyPy. If PyPy gains native async generator support before CPython 3.7 comes out then async_generator may be entirely irrelevant before PEP 550 matters. But right now async_generator is still quite handy... 2b. BTW, the contextmanager trick is quite nice -- I actually noticed last week that PEP 521 had a problem here, but didn't think of a solution :-). 3. You're right that numpy is *very* performance sensitive about accessing the context -- the errstate object is needed extremely frequently, even on trivial operations like adding two scalars, so a dict lookup is very noticeable. (Imagine adding a dict lookup to float.__add__.) Right now, the errstate object get stored in the threadstate dict, and then there are some dubious-looking hacks involving a global (not thread-local) counter to let us skip the lookup entirely if we think that no errstate object has been set. Really what we ought to be doing (currently, in a non PEP 550 world) is storing the errstate in a __thread variable -- it'd certainly be worth it. Adopting PEP 550 would definitely be easier if we knew that it wasn't ruling out that level of optimization. 4. I'm worried that all of your examples use string keys. One of the great things about threading.local objects is that each one is a new namespace, which is a honking great idea -- here it prevents accidental collisions between unrelated libraries. And while it's possible to implement threading.local in terms of the threadstate dict (that's how they work now!), it requires some extremely finicky code to get the memory management right: https://github.com/python/cpython/blob/dadca480c5b7c5cf425d423316cd695bc5db3023/Modules/_threadmodule.c#L558-L595 It seems like you're imagining that this API will be used directly by user code? Is that true? ...Are you sure that's a good idea? Are we just assuming that not many keys will be used and the keys will generally be immortal anyway, so leaking entries is OK? Maybe this is nit-picking, but this is hooking into the language semantics in such a deep way that I sorta feel like it would be bad to end up with something where we can never get garbage collection right. The suggested index-based API for super fast C lookup also has this problem, but that would be such a low-level API -- and not part of the language definition -- that the right answer is probably just to document that there's no way to unallocate indices so any given C library should only allocate, like... 1 of them. Maybe provide an explicit API to release an index, if we really want to get fancy. 5. Is there some performance-related reason that the API for getting/setting isn't just sys.get_execution_context()[...] = ...? Or even sys.execution_context[...]? 5a. Speaking of which I'm not a big fan of the None-means-delete behavior. Not only does Python have a nice standard way to describe all the mapping operations without such hacks, but you're actually implementing that whole interface anyway. Why not use it? 6. Should Thread.start inherit the execution context from the spawning thread? 7. Compatibility: it does sort of break 3rd party contextmanager implementations (contextlib2, asyncio_extras's acontextmanager, trio's internal acontextmanager, ...). This is extremely minor though. 8. You discuss how this works for asyncio and gevent. Have you looked at how it will interact with tornado's context handling system? Can they use this? It's the most important extant context implementation I can think of (aside from thread local storage itself). 9. OK, my big question, about semantics. The PEP's design is based on the assumption that all context-local state is scalar-like, and contexts split but never join. But there are some cases where this isn't true, in particular for values that have "stack-like" semantics. These are terms I just made up, but let me give some examples. Python's sys.exc_info is one. Another I ran into recently is for trio's cancel scopes. So basically the background is, in trio you can wrap a context manager around any arbitrary chunk of code and then set a timeout or explicitly cancel that code. It's called a "cancel scope". These are fully nestable. Full details here: https://trio.readthedocs.io/en/latest/reference-core.html#cancellation-and-timeouts Currently, the implementation involves keeping a stack of cancel scopes in Task-local storage. This works fine for regular async code because when we switch Tasks, we also switch the cancel scope stack. But of course it falls apart for generators/async generators: async def agen(): with fail_after(10): # 10 second timeout for finishing this block await some_blocking_operation() yield await another_blocking_operation() async def caller(): with fail_after(20): ag = agen() await ag.__anext__() # now that cancel scope is on the stack, even though we're not # inside the context manager! this will not end well. await some_blocking_operation() # this might get cancelled when it shouldn't # even if it doesn't, we'll crash here when exiting the context manager # because we try to pop a cancel scope that isn't at the top of the stack So I was thinking about whether I could implement this using PEP 550. It requires some cleverness, but I could switch to representing the stack as a singly-linked list, and then snapshot it and pass it back to the coroutine runner every time I yield. That would fix the case above. But, I think there's another case that's kind of a showstopper. async def agen(): await some_blocking_operation() yield async def caller(): ag = agen() # context is captured here with fail_after(10): await ag.__anext__() Currently this case works correctly: the timeout is applied to the __anext__ call, as you'd expect. But with PEP 550, it wouldn't work: the generator's timeouts would all be fixed when it was instantiated, and we wouldn't be able to detect that the second call has a timeout imposed on it. So that's a pretty nasty footgun. Any time you have code that's supposed to have a timeout applied, but in fact has no timeout applied, then that's a really serious bug -- it can lead to hangs, trivial DoS, pagers going off, etc. Another problem is code like: async def caller(): with fail_after(10): ag = agen() # then exit the scope Can we clean up the cancel scope? (e.g., remove it from the global priority queue that tracks timeouts?) Normally yes, that's what __exit__ blocks are for, letting you know deterministically that an object can be cleaned up. But here it got captured by the async generator. I really don't want to have to rely on the GC, because on PyPy it means that we could leak an unbounded number of cancel scopes for a finite but unbounded number of time, and all those extra entries in the global timeout priority queue aren't free. (And sys.exc_info has had buggy behavior in analogous situations.) So, I'm wondering if you (or anyone) have any ideas how to fix this :-). Technically, PEP 521 is powerful enough to do it, but in practice the performance would be catastrophically bad. It's one thing to have some extra cost to yielding out of an np.errstate block, those are rare and yielding out of them is rare. But cancel scopes are different: essentially all code in trio runs inside one or more of them, so every coroutine suspend/resume would have to call all those suspend/resume hooks up and down the stack. OTOH PEP 550 is fast, but AFAICT its semantics are wrong for this use case. The basic invariant I want is: if at any given moment you stop and take a backtrace, and then look at the syntactic surroundings of each line in the backtrace and write down a list of all the 'with' blocks that the code *looks* like it's inside, then context lookups should give the same result as they would if you simply entered all of those with blocks in order. Generators make it tricky to maintain this invariant, because a generator frame's backtrace changes every time you call next(). But those are the semantics that make the most sense to me, and seem least surprising in practice. These are also IIUC the semantics that exc_info is supposed to follow (though historically the interaction of exc_info and generators has had lots of bugs, not sure if that's been fixed or not). ...and now that I've written that down, I sort of feel like that might be what you want for all the other sorts of context object too? Like, here's a convoluted example: def gen(): a = decimal.Decimal("1.111") b = decimal.Decimal("2.222") print(a + b) yield print(a + b) def caller(): # let's pretend this context manager exists, the actual API is more complicated with decimal_context_precision(3): g = gen() with decimal_context_precision(2): next(g) with decimal_context_precision(1): next(g) Currently, this will print "3.3 3", because when the generator is resumed it inherits the context of the resuming site. With PEP 550, it would print "3.33 3.33" (or maybe "3.3 3.3"? it's not totally clear from the text), because it inherits the context when the generator is created and then ignores the calling context. It's hard to get strong intuitions, but I feel like the current behavior is actually more sensible -- each time the generator gets resumed, the next bit of code runs in the context of whoever called next(), and the generator is just passively inheriting context, so ... that makes sense. OTOH of course if you change the generator code to: def gen(): a = decimal.Decimal("1.111") b = decimal.Decimal("2.222") with decimal_context_precision(4): print(a + b) yield print(a + b) then it should print "3.333 3.333", because the generator is overriding the caller -- now when we resume the frame we're re-entering the decimal_context_precision(4) block, so it should take priority. So ... maybe all context variables are "stack-like"? -n -- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
