New submission from Nathaniel Smith <n...@pobox.com>:

sys.set_coroutine_wrapper is a problematic API. It was added to solve a 
specific problem: asyncio debug mode wanting to track where coroutine objects 
are created, so that when unawaited coroutines are GC'ed, the warning they 
print can include a traceback. But to do this, it adds a *very* generic hook 
that lets you hook into the coroutine creation code to replace all instances of 
a built-in type with an arbitrary new type, which is highly unusual. It's hard 
to use correctly (see bpo-30578, and bpo-24342 before it). It's hard to use 
without creating performance problems, since anything you do here tends to add 
overhead to every async function call, and in many cases also every async 
function frame suspend/resume cycle.

Fortunately, it's explicitly documented as provisional. (I think Yury 
intentionally excluded it from the stabilization of the rest of asyncio and 
async/await, because of the reasons above.) And the things we *actually* want 
to do here are very simple. The only known use cases are the one mentioned 
above (see asyncio.coroutines.CoroWrapper), and the one in bpo-30491. So after 
discussions with Yury, I'm proposing to migrate those both into the interpreter 
as directly usable, fast, built-in features that can be used individually or 
together, and deprecate sys.set_coroutine_wrapper.

See bpo-30491 for details on that use case; here I'll discuss my plan for 
replacing the "coroutine origin tracking" that asyncio debug mode does.

We add a new function sys.set_coroutine_origin_tracking(depth), and I guess 
sys.get_coroutine_origin_tracking() because why not. The depth is thread-local, 
and defaults to 0, which means that newly created coroutines don't capture any 
origin info.

When a coroutine is created, it will check the current origin_tracking depth, 
and capture that many frames of traceback information. Like the current asyncio 
debug code and unlike real tracebacks, we won't capture actual frame objects -- 
we'll just capture (filename, lineno, function) information, to avoid pinning 
objects in memory. This is a debug helper, so it should avoid changing 
semantics whenever possible.

Later, we can retrieve the captured information by accessing the new attribute 
coro.origin.

In addition, the code in CoroutineType.__del__ that currently emits a warning 
for unawaited coroutines, will be modified to check for whether 'self' has 
origin information, and print it if so, similar to the current asyncio debug 
wrapper.

Some particular points where I'd appreciate feedback:

Should we add an envvar to configure coroutine source tracking? What about an 
-X switch?

Should it be coro.origin or coro.cr_origin? On the one hand the cr_ prefixing 
is super annoying and I think we should probably get rid of it; on the other 
maybe we should keep it here so things stay consistent and then have a separate 
the debate about removing it in general.

Most important: what format should we use for storing the origin information? I 
can see three plausible approaches:

1. Call traceback.StackSummary.extract. This means re-entering the Python 
interpreter from inside the coroutine setup code, which makes me a little 
nervous. I guess there can't be any real re-entrancy issues here because if 
there were, set_coroutine_wrapper would already be hitting them. Can it cause 
problems during shutdown when the traceback module might have disappeared? (Not 
that anyone's likely to be instantiating coroutines at that point.) It also 
might be a bit slower than the other options below, but this isn't clear. OTOH, 
it has the benefit that coro.origin could be a full-fledged StackSummary with 
all its features, like traceback printing. Which honestly feels a little weird 
to me, because I know coroutine objects are built-in and StackSummary objects 
aren't, but it would certainly be convenient.

2. Capture (filename, lineno, function) directly in a packed struct, similar to 
how tracemalloc works. Then coro.origin can unpack this into a list of tuples 
or whatever. This is definitely very fast, and avoids re-entering Python. In 
practice we'd probably still end up needing to re-enter Python and using the 
traceback module when it came time to print a warning, though, because 
traceback printing is complicated and we don't want to reimplement it. Also, if 
the origin comes from a custom importer like zipimport, then we may not be able 
to look up the line information later, because that requires access to 
frame.f_globals["__loader__"].

3. Like (2), but insert a call to linecache.lazycache for each origin frame we 
capture. This would fix option (2)'s issue with printing source lines, but it 
means re-entering Python as well, so at this point maybe it'd just be simpler 
to go with option (1).

Given this analysis I guess I'm leaning towards just calling 
StackSummary.extract, but I don't feel like I fully understand the consequences 
of calling into a Python module like that so want to hear what others think.

----------
components: Interpreter Core, asyncio
messages: 310226
nosy: asvetlov, giampaolo.rodola, njs, yselivanov
priority: normal
severity: normal
status: open
title: Deprecate sys.set_coroutine_wrapper and replace it with more focused 
API(s)
versions: Python 3.7, Python 3.8

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue32591>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to