[Python-Dev] Re: PEP 667: Consistent views of namespaces

2021-08-21 Thread Guido van Rossum
Hopefully anyone is still reading python-dev.

I'm going to try to summarize the differences between the two proposals,
even though Mark already did so in his PEP. But I'd like to start by
calling out the key point of contention.

Everything here is about locals() and f_locals in *function scope*. (I use
f_locals to refer to the f_locals field of frame objects as seen from
Python code.) And in particular, it is about what I'll call "extra
variables": the current CPython feature that you can add *new* variables to
f_locals that don't exist in the frame, for example:

def foo():
x = 1
locals()["y"] = 2  # or sys._getframe()["y"] = 2

My first reaction was to propose to drop this feature, but I realize it's
kind of important for debuggers to be able to execute arbitrary code in
function code -- assignments to locals should affect the frame, but it
should also be possible to create new variables (e.g. temporaries). So I
agree we should keep this.

Terminology-wise, I will refer to variables that are allocated in the frame
(like "x" above, and including nonlocals/cells) as "proper" variables.

Both PEPs give up when it comes to locals(), declaring it to return a
snapshot in this case. This is mostly to ensure better backwards
compatibility, since existing code calling locals() may well assume it's a
dict. Both PEPs make f_locals some kind of proxy that gives a direct
read-write view on the variables in the frame (including cells used for
nonlocal references), but they differ in the precise semantics.

So apparently the key difference of opinion between Mark and Nick is about
f_locals, and what to do with extras. In Nick's proposal when you reference
f.f_locals twice in a row (for the same frame object f), you get the same
proxy object, whereas in Mark's proposal you get a different object each
time, but it doesn't matter, because the proxy has no state other than a
reference to the frame. In Mark's proposal, if you assign a value to an
extra variable, it gets stored in a hidden dict field on the frame, and
when you read the proxy, the contents of that hidden dict field gets
included. This hidden dict lazily created on the first store to an extra
variable. (Mark shows pseudo-code to clarify this; the hidden dict is
stored as _extra_locals on the frame.)

In Nick's proposal, there's a cache on the frame that stores both the
extras and the proper variables. This cache can get out of sync with the
contents of the proper variables when some bytecode is executed (for
performance reasons we don't want the bytecode to keep the cache up to date
on every store), so there's an operation to sync the frame cache
(sync_frame_cache(), it's not defined in which namespace this exists -- is
it a builtin or in sys?).

Frankly the description in Nick's PEP is hard to follow -- I am not 100%
sure what is meant by "the dynamic snapshot", and it's not quite clear
whether proper variables are copied into the cache (and if so, why).

There are also differences in the proposed C API changes, but the
differences there are solvable once we choose the semantics for f_locals.

Personally, I find Mark's proposed semantics for f_locals simpler --
there's no cache, only storage for extras, so there's nothing that can get
out of sync.

I would even consider making locals() return the same proxy -- this is
simpler and more consistent with module and class scopes, but it is less
backwards compatible, and locals() is used orders of magnitude more than
f_locals. (Also, we'd have to modify exec() and eval() to allow using a
non-dict as globals, which would require some deep changes in the
interpreter.)

--Guido

PS. In Mark's PEP, there's a pseudo-code version of locals() that can give
a different result in class scope than the current CPython implementation:
Using __prepare__, a metaclass can provide a namespace to execute the class
body that's not a dict (subclass) instance. The current CPython behavior
and AFAICT Nick's PEP return that namespace from locals(), but Mark's
pseudo-code would return a snapshot copy. I think it's better to stick to
the current semantics (and I suspect Mark overlooked this edge case).

On Fri, Aug 20, 2021 at 8:23 AM Mark Shannon  wrote:

> Hi all,
>
> I have submitted PEP 667 as an alternative to PEP 558.
> https://www.python.org/dev/peps/pep-0667/
>
> Nick and I have agreed to disagree on the way to fix locals() and
> f_locals. We are both in agreement that it needs fixing.
>
> In summary, PEP 667 has roughly the same surface behavior as PEP 558 but
> is simpler and more consistent internally, at the expense of some minor
> C API backwards incompatibility issues.
>
> PEP 558 also has backwards incompatibility issues, but claims to be more
> compatible at the C API level.
>
> Cheers,
> Mark.
> ___
> Python-Dev mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> https://mail.python.org/mailman3/lists/python-dev.python.

[Python-Dev] Re: PEP 667: Consistent views of namespaces

2021-08-21 Thread Nick Coghlan
On Sun, 22 Aug 2021, 10:47 am Guido van Rossum,  wrote:

> Hopefully anyone is still reading python-dev.
>
> I'm going to try to summarize the differences between the two proposals,
> even though Mark already did so in his PEP. But I'd like to start by
> calling out the key point of contention.
>
> Everything here is about locals() and f_locals in *function scope*. (I use
> f_locals to refer to the f_locals field of frame objects as seen from
> Python code.) And in particular, it is about what I'll call "extra
> variables": the current CPython feature that you can add *new* variables to
> f_locals that don't exist in the frame, for example:
>
> def foo():
> x = 1
> locals()["y"] = 2  # or sys._getframe()["y"] = 2
>
> My first reaction was to propose to drop this feature, but I realize it's
> kind of important for debuggers to be able to execute arbitrary code in
> function code -- assignments to locals should affect the frame, but it
> should also be possible to create new variables (e.g. temporaries). So I
> agree we should keep this.
>

I actually tried taking this feature out in one of the PEP 558 drafts, but
actually doing so breaks the pdb test suite.



>
> So apparently the key difference of opinion between Mark and Nick is about
> f_locals, and what to do with extras. In Nick's proposal when you reference
> f.f_locals twice in a row (for the same frame object f), you get the same
> proxy object, whereas in Mark's proposal you get a different object each
> time, but it doesn't matter, because the proxy has no state other than a
> reference to the frame.
>

If PEP 558 is still giving that impression, I need to fix the wording - the
proxy objects are ephemeral in both PEPs (the 558 text is slightly behind
the implementation on that point, as the fast refs mapping is now stored on
the frame object, so it only needs to be built once)

In Mark's proposal, if you assign a value to an extra variable, it gets
> stored in a hidden dict field on the frame, and when you read the proxy,
> the contents of that hidden dict field gets included. This hidden dict
> lazily created on the first store to an extra variable. (Mark shows
> pseudo-code to clarify this; the hidden dict is stored as _extra_locals on
> the frame.)
>

PEP 558 works essentially the same way, the difference is that it uses the
existing locals dict storage rather than adding new storage just for
optimised frames.

In Nick's proposal, there's a cache on the frame that stores both the
> extras and the proper variables. This cache can get out of sync with the
> contents of the proper variables when some bytecode is executed (for
> performance reasons we don't want the bytecode to keep the cache up to date
> on every store), so there's an operation to sync the frame cache
> (sync_frame_cache(), it's not defined in which namespace this exists -- is
> it a builtin or in sys?).
>

It's an extra method on the proxy objects. You only need it if you keep an
old proxy object around - if you always retrieve a new proxy object after
executing Python code, that proxy will refresh the cache when it needs to.



> Frankly the description in Nick's PEP is hard to follow -- I am not 100%
> sure what is meant by "the dynamic snapshot", and it's not quite clear
> whether proper variables are copied into the cache (and if so, why).
>

Aye, Mark was a bit quicker with his PEP than I anticipated, so I've
incorporated the implementation improvements arising from his last round of
comments, but the PEP text hasn't been updated yet.


Personally, I find Mark's proposed semantics for f_locals simpler --
> there's no cache, only storage for extras, so there's nothing that can get
> out of sync.
>

The wording in PEP 667 undersells the cost of that simplification:

"Code that uses PyEval_GetLocals() will continue to operate safely, but
will need to be changed to use PyEval_Locals() to restore functionality."


Code that uses PyEval_GetLocals() will NOT continue to operate safely under
PEP 667: all such code will raise an exception at runtime, and need to be
rewritten to use a new API with different refcounting semantics. That's
essentially all code that accesses the frame locals from C, since we don't
offer supported APIs for that other than PyEval_GetLocals() (directly
accessing the f_locals field on the frame object is only "supported" in a
very loose sense of the word, although PEP 558 mostly keeps that working,
too)

This means the real key difference between the two PEPs is that Mark is
proposing a gratuitous compatibility break for PyEval_GetLocals() that also
means that the algorithmic complexity characteristics of the proxy
implementation will be completely off from those of a regular dict (e.g.
len(proxy) will be O(n) in the number of variables defined on the frame
rather than being O(1) after the proxy's initial cache update the way it is
in PEP 558)

If Mark's claim that PyEval_GetLocals() could not be fixed was true then I
would be more sympathetic to his