[Python-ideas] Re: Add static variable storage in functions

Chris Angelico Thu, 27 May 2021 09:24:54 -0700

On Fri, May 28, 2021 at 2:03 AM Steven D'Aprano <[email protected]> wrote:
> I'll admit I'm not an expert on the various LOAD_* bytecodes, but I'm
> pretty sure LOAD_FAST is used for local variables. Am I wrong?


You're correct, but I dispute that that's the best way to do things.

> Right. In principle we could just shove the static values in the
> __defaults__ tuple, but it's probably better to use a distinct
> __statics__ dunder.

Right, agreed.

> If you don't store the values away somewhere on function exit, how do
> you expect them to persist from one call to the next? Remember that they
> are fundamentally variables -- they should be able to vary from one call
> to the next.
>
>
> > and it could cause extremely confusing results with
> > threading.
>
> Don't we have the same issues with globals, function attributes, and
> instance attributes?

Your proposal requires that every static involve a load at function
start and a store-back at function end. My proposal requires that they
get loaded directly from their one true storage location (a dict, or
some kind of cell system, or whatever) attached to the function, and
stored directly back there when assigned to. Huge difference.

With your proposal, two threads that are *anywhere inside the
function* can trample over each other's statics. Consider:

def f():
    static n = 0
    n += 1
    ...
    ...
    ...
    ...
    # end of function

Your proposal requires that the "real" value of n not be updated until
the function exits. What if that takes a long time to happen - should
the static value remain at its old value until then? What if it never
exits at all - if it's a generator function and never gets fully
pumped?

Mutating a static needs to happen immediately.

> I'm okay with saying that if you use static *variables* (i.e. they
> change their value from one call to the next) they won't be thread-safe.
> As opposed to static "constants" that are only read, never written.

They'll never be fully thread-safe, but it should be possible to have
a short-lived lock around the mutation site itself, followed by an
actual stack-local. By your proposal, the *entire function* becomes
non-thread-safe, *no matter what you do with locks*. By my proposal,
this kind of code becomes entirely sane:

def f():
    static lock = Lock()
    static count = 0
    with lock:
        my_count = count + 1
        count = my_count
    ...
    ...
    ...
    print(my_count)

There's a language guarantee with every other form of assignment that
it will happen immediately in its actual target location. There's no
write-back caching anywhere else in the language. Why have it here?

> But if you have a suggestion for a thread-safe way for functions to
> keep variables alive from one call to the next, that doesn't suffer a
> big performance hit, I'm all ears :-)

The exact way that I described: a function attribute and a dedicated
opcode pair :)

> > Agreed, I'd use it too. But I'd define the semantics slightly differently:
> >
> > * If there's an expression given, evaluate that when the 'def'
> > statement is executed, same as default args
>
> That's what I said, except I called it function definition time :-)

Yep, that part we agree on.

> > * Otherwise it'll be uninitialized, or None, bikeshedding opportunity, have 
> > fun
>
> I decided on initialising them to None because it is more convenient to
> write:
>
>     if my_static_var is None:
>         # First time, expensive computation.
>         ...
>
> than the alternative with catching an exception. YMMV.

Not hugely important either way, I'd be happy with either.

> > * Usage of this name uses a dedicated LOAD_STATIC or STORE_STATIC bytecode
> > * The values of the statics are stored in some sort of
> > high-performance cell collection, indexed numerically
>
> Isn't that what LOAD_FAST already does?

This would be a separate cell collection. The LOAD_FAST cells are in
the stack frame, the LOAD_STATIC cells are on the function object. But
yes, the code could be pretty much identical.

> > It would be acceptable to store statics in a dict, too, but I suspect
> > that that would negate some or all of the performance advantages.
> > Whichever way, though, it should ideally be introspectable via a
> > dunder attribute on the function.
>
> Maybe I'm misunderstanding you, or you me. Let me explain further
> what I think can happen.
>
> When the function is being executed, I think that static variables
> should be treated as local variables. Why build a whole second
> implementation for fast cell-based variables, with a separate set of
> bytecodes, to do exactly what locals and LOAD_FAST does? We should reuse
> the existing fast local variable machinery, not duplicate it.

Because locals are local to the invocation, not the function. They're TOO local.

> The only part that is different is that those static locals have to be
> automatically initialised on function entry (like parameters are), and
> then on function exit their values have to be stored away in a dunder so
> they aren't lost (as plain local variables are lost when the function
> exists). That bit is new.

Right, except that the mutations have to happen immediately.

> We already have a mechanism to initialise locals: it's used for default
> values. The persistent data is retrieved from the appropriate dunder on
> the function and bound to the local variable (parameter). We can do the
> same thing. We will probably use a different dunder.
>
> That doesn't mean that every single access to the local static variable
> needs to be retrieved from the dunder, that would likely be slow. Only
> on function entry.

I'm not sure that it would have to be all that slow; global lookup has
to do a lot more work than static lookup would. But that would be
something to measure.

> The difference between function default arguments and statics is that if
> you rebind the parameter, that new value doesn't get written out to the
> __defaults__ dunder on function exit. But for statics, it should be.
>
> Are we on the same page here?

No, because function exit is too late.

> > Semantically, this would be very similar to writing code like this:
> >
> > def count():
> >     THIS_FUNCTION.__statics__["n"] += 1
> >     return THIS_FUNCTION.__statics__["n"]
> > count.__statics__ = {"n": 1}
> >
> > except that it'd be more optimized (and wouldn't require magic to get
> > a function self-reference).
>
> The most obvious optimization is that you only write the static value
> out to the dunder on function exit, not on ever rebinding operation.

That's too aggressive an optimization. Consider this function:

def walk(tree):
    if tree is None: return
    static idx = 0
    walk(tree.left)
    idx += 1
    print(idx, tree.data)
    walk(tree.right)

Suppose that, during the recursive call down the left tree, idx gets
incremented five times. Now we return out of there and come back to
the original invocation. What should the value of idx be? By your
semantics, it's been loaded at the very start of the function, so
it'll still be zero! And at the end of the function, regardless of the
recursive calls to either left or right, a 1 will be written out to
the static. You've effectively made statics utterly useless for
recursion.

In contrast, directly loading and writing the statics has the exact
semantics of every other load/store in the language - it happens
immediately at its correct location, atomically, and can be relied
upon.

ChrisA
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/GNHK2MG53N7T33AGQCNEDGHGMIFULYUH/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Add static variable storage in functions

Reply via email to